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Dark satanic wings 


Just as the dark-coloured pepper moth disappears from northern England, researchers are finally 


getting to the bottom of how it gained its colour. 


Birmingham called the Black Country during the late nine- 

teenth century. It was the dark polluted heart of the industrial 
revolution, according to a railway guide from 1851: “The pleasant 
green of pastures is almost unknown, the streams, in which no fishes 
swim, are black and unwholesome; the natural dead flat is often bro- 
ken by high hills of cinders and spoil from the mines; the few trees 
are stunted and blasted; no birds are to be seen, except a few smoky 
sparrows; and for miles on miles a black waste spreads around, where 
furnaces continually smoke, steam engines thud and hiss, and long 
chains clank, while blind gin-horses walk their doleful round” 

A few kilometres to the north, where trees remained, the wild- 
life was already adapting to its new human-made environment. 
You have all seen the results. The first famous dark-coloured pep- 
pered moth — a staple of textbooks — was recorded in Manchester 
in 1848. Halfa century later, they were everywhere. The wild-type, 
light-coloured and mottled, had disappeared almost to extinction. 

There is perhaps no better example of natural selection in action 
than the case of the peppered moth (Biston betularia). As the same text- 
books say, the colour of the moths evolved to match their new, sooty, 
backgrounds, and thereby camouflage the insects from hungry birds. 

The story is not actually quite so clear-cut. Geneticists have 
squabbled over the details for decades — the strength of the evidence 
for the assumed choice of the birds, for example — and some of these 
technical criticisms have leaked, out of context, into the propaganda 
of creationists. In response, some textbooks have — heaven for- 
bid — evolved to not include the peppered-moth story at all. 

Among the holes in the story was the identity of the gene that was 
involved in producing the dark-coloured — melanic — moth variant. 

Extensive mapping has pinned it down to a 400-kilobase region 
containing 13 genes, none of which had any obvious role in wing 
coloration. Undeterred, scientists went on to isolate the gene respon- 
sible, and they describe their search on page 102 of this issue. It is 
called cortex, orthologous to a gene of the same name in Drosophila. 
The researchers have even gone further, and shown that the specific 
cause of the mutation is the insertion of a transposable element 
(popularly, a ‘jumping gene’) into the first intron of the cortex gene. 

The insertion leads to increased transcription of the gene during 
a phase of development when the wing discs are forming. The cortex 
gene, then, is involved in wing development, but there is still no obvi- 
ous association with coloration. In Drosophila, cortex is involved in 
cell-cycle regulation, in particular, marking proteins that are redun- 
dant in the cell cycle as being ready for disposal. What is going on? 

Work from a different group of Lepidoptera might offer a solution. 
Ina study described on page 106, another group of researchers shows 
that cortex is a key player in the coloration of the wings of butterflies 
in the genus Heliconius, long a favourite for the study of mimicry. 
They show that cortex is a member of a fast-evolving scion of an 


NE for nothing was the region of the English midlands north of 


otherwise conservative group of cell-cycle regulator genes known as 
the fizzy family, a name redolent of activity, growth and fervour, and 
possibly involved in the regulation of wing-scale development. This 
is important, because it is the size, density and surface properties of 
the wing scales that determine colour in butterflies and moths. Flies, 
such as Drosophila, lack these structures, perhaps explaining why it 
was initially hard to associate the cortex gene with wing development. 

There is a further, satisfying twist to the 


“Thereis enough tale. Although it is possible that melanic 
inthepagesthat mutants existed undetected at a very low 
followtoupdate level inthe peppered-moth population for 


the textbooks.” centuries, the specific mutation behind their 
coloration is relatively recent, appearing 
around 1819 — in plenty of time for it to be noted down in Manches- 
ter a couple of decades later. 

Much, of course, remains to be discovered, not least of which is 
the precise mode of action of cortex; how the gene relates to wing- 
scale development; and how the insertion of a transposable element 
contrives to alter this. But there is enough in the pages that follow to 
update those textbooks. Still, future generations of readers will find it 
harder to recognize the high hills of cinders and spoil from the mines 
that drove the change. The air is cleaner these days, ‘Black Country’ is 
no longer an apt description, and the dark-coloured peppered moths 
are vanishing as quickly as they emerged. m= 


Toxic control 


The United States is overhauling its chemicals 
law; now it must tackle carbon emissions. 


be one of the worst pieces of environmental legislation 

ever devised. Rather than empowering the Environmental 
Protection Agency (EPA) to ensure that new chemicals are safe, 
the law declared all chemicals harmless, unless proven otherwise. 
The situation is so preposterous, in fact, that even the normally 
dysfunctional US Congress managed to unite last week to advance 
reform (see page 18). 

The bipartisan TSCA reform bill passed the House of Representa- 
tives, by a vote of 403-12, on 24 May. Although senator Rand Paul 
(Republican, Kentucky) has temporarily blocked a vote in the Senate, 
the legislation is expected to pass in the coming weeks, clearing the 
way for a signature by President Barack Obama. Once that happens, 
EPA scientists will at last have the authority to do their jobs. 


To 1976 US Toxic Substances Control Act (TSCA) must 
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Rather than watching passively as some 700 new chemicals enter 
all corners of the US marketplace each year, the EPA would be able to 
require companies to provide more data and conduct extra research 
to demonstrate the safety of the products. The legislation would also 
bolster review of existing substances. The TSCA inventory currently 
lists some 85,000 chemicals, but no one knows how many are still 
in use today. The EPA would create a new inventory and then sift 
through it to see which ones merit further investigation. 

What is most remarkable about this reform legislation — aside 
from the fact that it took so long — is the list of supporters: Demo- 
crats and Republicans, both houses of Congress and the legislative 
branch, as well as many environmentalists and the chemical indus- 
try. The reason is simple: the companies that manufacture and use 
chemicals, once adamantly opposed to such reform bills, have real- 
ized that a viable federal regulatory system is in their financial inter- 
est. The complete lack of public confidence in the EPA’s authority 
under the TSCA has pushed environmental officials at the state level 
to launch their own investigations and regulations. The upshot is 
that without a stronger federal system, the industry faces an increas- 
ingly complex — and uncertain — patchwork of regulations. 

This is all good news for the public, which is bombarded daily by 
news reports, environmental campaigns and scientific studies that 
analyse the danger of one chemical or another in products that they 
purchase every day. It is also good for science. The new law will drive 
research into chemicals of concern, and companies will find it harder 
to claim that the information that they submit is a trade secret. Asa 
result, more data will enter the public and academic spheres, and that 
is always a good thing. 

Environmentalists pushed to ensure that the EPA's new decisions 
about health risks will be based on health data alone, without regard 
to economic implications. Under the new legislation, the EPA would 
be able to consider economic impacts in any subsequent cost-benefit 


analysis only if it moves forward with regulations. And industry 
pushed for mandatory deadlines to ensure that decisions are made 
ina timely manner. All in all, it’s a reasonable compromise that moves 
the regulatory needle in the right direction. 

It is also a blueprint for what ultimately needs to happen to 
break the legislative stalemate on what is perhaps the greatest envi- 
ronmental challenge: the effect of greenhouse gases on climate. 

Despite overwhelming evidence showing 


“It’sa the need for action, the energy industry 
reasonable has obstructed and stalled for too long, and 
compromise the only real result is prolonged regulatory 
that moves uncertainty. If major businesses, including 
the regulatory energy producers and consumers, were to 
needle in the get together en masse and push for regu- 


lation, Republican lawmakers would be 
forced to pull their heads out of the sand 
and think about reasonable solutions that are in line with their own 
political values. 

Low-carbon energy such as nuclear power and that obtained from 
renewables would benefit the most, but natural gas would also get a 
short-term boost as utilities back further away from coal, which is 
already on the decline. Even coal would see its chances of survival 
increase in the long run, because properly agreed federal regulations 
would bolster the economics and interest in technologies that can be 
used to capture and sequester, or even use, carbon dioxide. At a mini- 
mum, with a legitimate set of rules in place, companies could move 
forward and plan their long-term investments accordingly. 

Everyone could see that the original TSCA bill created a problem. 
It has taken decades, but reform was inevitable. The need for legal 
controls on the generation and control of greenhouse gases is just as 
clear — indeed, that is why the energy industry has fought so hard to 
undermine the evidence. This time, we do not have decades to waste. = 


right direction.” 


Seeing farther 


Our fascination with telescopes and the worlds 
they reveal spreads beyond science into culture. 


credited with being the first to point one at the sky and record 
what he saw. Which begs a question: just what did the others 
before him do with theirs? 

Ever since the great man saw and drew the moons of Jupiter in 1610, 
astronomers — both amateur and professional — have been captivated 
by the night sky. For more than 400 years, through revolution, war 
and endless change on Earth, telescopes have brought the rest of the 
Universe to us in ever-greater detail. We perch them on the tops of the 
highest mountains, strap them to aircraft, dangle them from balloons 
and launch them into orbit, all to get a better view of the world outside 
our own. We even cut holes in the roofs of our houses for them. The 
word ‘telescope’ derives from the Latin for far-seeing, and never cana 
scientific instrument have been so well labelled. 

On page 34, Bernie Fanaroff, who as the former head of the Square 
Kilometre Array South Africa project knows a thing or two about 
telescopes, reviews a new account of their development and history. 
Eyes on the Sky by Francis Graham-Smith covers the entire spec- 
trum, from existing instruments to planned ones that gather every- 
thing from long-wavelength radio waves to high-frequency X-rays. 
Readers with a taste for the bizarre could also check out Unusual 
Telescopes by Peter Manly, published in paperback in 1995. Among 
the weird and wonderful designs are telescopes with mirrors made 
from polished rock, inflatable telescopes and ornamental telescopes 


ex Galilei did not invent the telescope, but he is generally 
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that double as sundials. 

The names of some of the newest additions to the telescope roster 
— some barely off the drawing board — indicate where the field 
is heading. The Very Large Telescope will soon be joined by the 
European Extremely Large Telescope, but not by its cancelled rival, 
the Overwhelmingly Large Telescope. 

But small instruments can be powerful, too, if there are enough of 
them. Maybe the future of astronomy lies not in ever-bigger adverbs but 
in tiny chips: a News story on page 15 offers a glimpse of that perhaps- 
not-too-distant technology. Next month, a package that holds dozens of 
Sprite mini-satellites is scheduled to be sent to the International Space 
Station, from where they will be released. It is a test run to gauge the 
potential of such ‘chipsats’ to swarm and collectively gather data on 
missions. 

Next month will also see a telescope-related launch of a different 
kind — anew festival at the historic UK Jodrell Bank observatory near 
Manchester, headlined by the French musician Jean Michel Jarre. It 
was scientists at Jodrell who famously, with the help of a fax machine 
borrowed from the Daily Express, scooped the Soviets and intercepted 
the first pictures of the lunar surface from the Luna 9 mission. The 
glory days of that observatory may be behind it, but its status as an 
iconic landmark demonstrates another feature of telescopes: they pro- 
vide a tangible link not just from astronomers to the Universe but from 
science to the wider public, especially when it involves an enormous 
radio dish. Indeed, the United Kingdom is seeking to have the site's 
cultural significance marked officially: Jodrell Bank is being consid- 
ered for listing as a UNESCO World Heritage Site. 

Telescopes and their discoveries have always spread beyond 
science. Shortly after Galileo drew Jupiter and its four moons, William 
Shakespeare is thought to have completed Cymbeline, one of his final 
plays. At its climax, the god Jupiter descends to the stage, preceded by 
four angels. Science and culture have never looked back, or so far. m 
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hat use is basic science to the developing world? Why 
West a nation that cannot feed all its people try to senda 
spacecraft to Mars? Instead, scientific research in poorer 
nations is expected to focus on applied problems. Surrounded by poor 
prospects and infrastructure, institutions in these countries support 
fast-producing research that can provide direct results to the economy. 

Much foreign investment goes the same way. For developing 
countries to work on pure science is often viewed from outside as 
indulgent and wasteful. Witness the argument that took place in the 
United Kingdom last year over India — a recipient of British aid — 
developing its own space programme. 

Applied research certainly has its place, in developing as well as 
developed countries. Science as a tool to make money and secure 
a food supply is key to survival. It can target 
local issues: a notable success is the process 
developed by Brazil to turn (locally abundant) 
sugar cane into ethanol as a biofuel resource. 
In southeast Asia, the science of cassava pests 
and diseases is a priority, because millions of 
people here rely on cassava as a staple food and 
a source of income. 

But we should not forget that there is more 
to life than accumulating resources. Many 
other factors threaten human existence — from 
mutating viruses to moving tectonic plates. In 
the Southern Hemisphere, where most develop- 
ing countries are located, natural disasters and 
emerging diseases haunt the lives of billions of 
people. The Ebola epidemic and the Zika virus, 
the Ecuador earthquake and the Aceh tsunami 
are just a few recent examples. And to understand them, we do not 
just need science that has an economic value. We need science that 
questions why the world is the way it is. 

Of course, some in the developing world already study pure science 
problems. In Indonesia, some researchers are analysing the genetics of 
Indonesian people and their susceptibility to certain diseases — work 
that also offers insights into human origins. Others are studying the 
ecology and evolution of non-human primates. But these efforts are 
dwarfed by the many government-funded projects on applied topics 
such as agriculture, pharmacy and animal husbandry. 

Besides the fact that it has less economic value, basic science is not 
encouraged in developing countries because it is expensive. Almost all 
such countries allocate less than 1% of gross domestic product to scien- 
tific research. In 2016, the grant from Indonesia's Ministry of Research 
and Technology for a research project rarely exceeded US$100,000 — 
not enough to buy cutting-edge laboratory equipment. We see a similar 
picture in other developing countries, including many in Africa. 

Things are starting to change. Earlier this year, President 
Joko Widodo of Indonesia signed into existence the Indonesian 


WE NEED 


SCIENCE 


THAT 


QUESTIONS 
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WORLD 


IS THE WAY ITIS. 


. The developing world 
" needs basic research too 


The establishment of an agency in Indonesia that will support ‘frontier 
research’ is awelcome development, argues Dyna Rochmyaningsih. 


Science Foundation (ISF), an independent funding body for science. 
The establishment of the ISF is a monumental event. For the first 
time, Indonesian scientists will have a funding source apart from the 
national budget (of which the proportion going to science is a very low 
0.08%). And, also for the first time, they will get multi-year research 
grants. The amount will be increased, up to $300,000 per successful 
research proposal. As a start, the Ministry of Finance has committed 
to provide $9 million in 2016 for research on life sciences, health and 
nutrition. 

And the most interesting part is that the new funding agency will 
not support applied science. Instead it will pay for ‘frontier research’ 
on the Universe, Earth, climate, the life sciences, health, nutrition, 
materials and computational science. 

The new programme might encourage the 
best Indonesian scientists scattered across 
the developed world to come back. It should 
encourage those in Indonesia to do better 
science. It will certainly grow scientific excel- 
lence in the country. Unlike applied science, the 
goal is not to use research as a tool, but for it to 
become a valuable and self-sustaining pursuit in 
its own right. The ISF is intended to create a sys- 
tem in which scientists can work independently, 
without the need for international support, to 
assess the scientific questions of their own 
land and to contribute to the universal quest 
for knowledge. It offers an opportunity for our 
scientists to stand on their own feet. 

The importance of basic science in poorer 
countries is recognized beyond Indonesia. 
Earlier this year, at a meeting to promote scientific talent in Africa, Mary 
Teuw Niane, minister of higher education and research in Senegal, spoke 
of the need for basic science in his and other developing nations. 

The African Academy of Science is working with funders 
including the Wellcome Trust and the Bill & Melinda Gates Foundation 
to boost basic research in health care. Last month, some £21 million 
($31 million) was awarded to scientists from Céte d'Ivoire, Kenya, 
Senegal and Uganda who are conducting research on emerging infec- 
tious diseases, neonatal and population health, and the elimination 
of malaria. 

It is too early to make predictions, but perhaps we can be optimistic 
that a new focus on basic research will produce a lasting change in sci- 
ence in the global South. Basic science may not give us an instant result 
but it will give us a deeper understanding about the world that changes 
all the time. And it will generate knowledge, which as policymakers 
from across the world insist, is at the heart of the modern economy. = 


Dyna Rochmyaningsih is a freelance science journalist in Jakarta. 
e-mail: drochmya87@gmail.com 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


Galaxy from the 
cosmic dark ages 


Astronomers have found the 
faintest example yet of a galaxy 
from the early Universe. 

Kuang-Han Huang of the 
University of California, 
Davis, and his colleagues 
spotted the 13-billion-year- 
old galaxy using the Keck 
Observatory in Hawaii and 
the Hubble Space Telescope. 
A cluster of galaxies in 
between acted like a lens, 
gravitationally bending light 
from the faint galaxy to make 
it visible to the telescopes. 

The detected galaxy is 
from the end of the ‘cosmic 
dark ages’ — when ultraviolet 
radiation from the earliest 
stars ionized the Universe's 
hydrogen to generate the levels 
seen today. The authors say 
that studying more galaxies 
like this one could reveal 
whether stars did this alone or 
had help from other sources, 
such as black holes. 
Astrophys. J. Lett. 823, L14 (2016) 


Growth factor 
treats diabetes 


Injecting a protein into rodent 
brains triggers long-term 
remission of type 2 diabetes. 
Certain types of protein 
called fibroblast growth 
factors (FGFs) decrease blood 
glucose levels when they are 
injected into the bloodstream 
of animals. To see whether 
they target the brain, Michael 
Schwartz of the University of 
Washington in Seattle and his 
colleagues injected the brains 
of rats and mice that had 
type 2 diabetes with one-tenth 
of the amount of FGF1 used 
for bloodstream injections. 
They found that blood glucose 
decreased to normal levels 
7 days after injection, and 


A boom in octopuses and cuttlefish 


Cephalopods, such as squid, cuttlefish and 
octopuses, may be benefiting from changes to 


their environment. 


Zoé Doubleday and Bronwyn Gillanders at 
the University of Adelaide in Australia and their 
colleagues compiled data from fisheries and 
scientific marine surveys on global cephalopod 
catch rates since 1953. They found that 
cephalopod populations (pictured is a Sepia 
cuttlefish species) have increased significantly 
over the past 60 years across some 35 species 


stayed that way for up to 
about 4 months. FGF1 did 
not change body weight, food 
intake or blood insulin levels, 
but glucose was cleared from 
the circulation into the liver 
and skeletal muscles twice 

as fast in treated mice as in 
untreated ones. 

Brain injection of FGF1 may 
combat diabetes by regulating 
neural circuits that control 
how the liver takes up glucose 
after meals, pointing the way 
towards possible drug targets, 
the authors speculate. 

Nature Med. http://dx.doi. 
org/10.1038/nm.4101 (2016) 
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with different lifestyles, such as ones that live on 
the sea floor and in the open ocean. 


Cephalopods have short lifespans, rapid 


End of a Martian 
ice age 


Frosty layers at Mars’s north 
pole show that the planet is 
emerging from an ice age. 
Mars experiences big climate 
shifts because of the way it 
tilts on its axis and orbits 
the Sun. Isaac Smith of the 
Southwest Research Institute 
in Boulder, Colorado, and 
his colleagues used a radar 
instrument aboard the Mars 
Reconnaissance Orbiter 
spacecraft to hunt for signs of 
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growth rates and are highly adaptable, which 
in changing conditions (such as ocean 
warming) could give them an advantage over 
slower-growing organisms, the authors say. 
The cephalopod boom, however, could have 
damaging effects on their prey populations, 
such as certain fish and marine invertebrates. 
Curr. Biol. 26, RA06-R407 (2016) 


these changes at the north pole. 
The geometry of the ice layers 
— sometimes flat, sometimes 
cutting across one another 

— allowed the scientists 

to work out the history of 

the ice. Some 87,000 cubic 
kilometres have built up since 
the end of the last ice age about 
370,000 years ago. 

The researchers conclude 
that the ice age is ending 
because most of this ice 
accumulated at the north 
pole, which, unlike on Earth, 
is warmer than the rest of the 
planet during an ice age. 
Science 352, 1075-1078 (2016) 


GERARD LACZ/REX/SHUTTERSTOCK 


DETHAN PUNALUR/GETTY 


PROC. NATL ACAD. SCI. USA 


CELL BIOLOGY 


How prions kill 
brain cells 


Brain-wasting proteins 

called prions kill neurons 

by shortening the dendritic 
spines that the cells use to 
transmit signals to each other. 

Prions are infectious and 
cause neurodegenerative 
diseases such as scrapie in 
animals and Creutzfeldt- 
Jakob disease in humans. 

To learn how they kill brain 
cells, David Harris at Boston 
University in Massachusetts 
and his co-workers exposed 
cultured mouse neurons to 
the prion that causes scrapie 
in mice. They found that the 
neurons dendritic spines 
retracted within 24 hours, 
before the cells died. This 
occurred only in neurons 
that made the normal, non- 
infectious form of the prion 
protein, which suggests that 
the disease-associated prion 
might bind to the normal one 
to trigger dendritic loss. 

This method could be used 
to test potential drugs against 
prion diseases, the authors say. 
PLoS Pathog. 12, e1005623 
(2016) 


ARCHAEOLOGY 


Ancient beer 
recipe from China 


A 5,000-year-old brewery in 
China used what was then 
an unusual ingredient 

— barley. 

A team led by Jiajing 
Wang at Stanford University 
in California analysed 
starch grains from 
pottery resembling 
brewing vessels 
(reconstructions 
pictured), which 
were discovered 
at the Mijiaya site 
in northern China 
about a decade ago. 
The vessels contained 
a mixture of millet, 
tubers, a tropical 
grass known as Job’s 


grains were swollen and 
deformed as though they 


had been mashed, a process 
that uses hot water to extract 
sugars. Chemical analysis 
of residues on the pottery 
revealed calcium oxalate, a 
common by-product of beer 
making. 

Barley was domesticated 
in Western Eurasia around 
10,000 years ago, but it did not 
become a major crop in China 
until around 2,200 years ago. 
The Mijiaya brewers may have 
seen barley as an exotic treat, 
the authors suggest. 
Proc. Natl Acad. Sci. USA 
http://doi.org/bhwm (2016) 


ANIMAL BEHAVIOUR 


Onlookers boost 
mouse chatter 


Male mice communicate 
more in front of an audience 
than when they are alone. 
Mice live in large social 
groups and communicate 
using ultrasonic frequencies. 
To learn how this social 
environment influences their 
vocalizations, Roian Egnor of 
the Howard Hughes Medical 
Institute's Janelia Research 
Campus in Ashburn, 
Virginia, and her colleagues 
exposed male mice in the lab 
to a female odour. They then 
compared vocal responses 
from animals that were alone 
to those that were in the 
presence of another male. 
Males that had an audience 
produced vocalizations that 
were longer and more 
complex than those 
from solo males. 
The male mice could 
be communicating 
to other males to 
compete for mates, 
the authors 
A suggest. 
= Sse] J. Exp. Biol. 219, 
g 1437-1448 
(2016) 
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ZOOLOGY 


Squid may reach 
epic sizes 


Giant squid could measure up 
to 20 metres in length. 

Fewer than 500 of the 
mysterious invertebrates 
(genus Architeuthis) have ever 
been measured. To calculate 
their maximum possible 
size, Charles Paxton of the 
University of St Andrews, 

UK, compiled recorded 
measurements of giant squid, 
and established relationships 
between measurements such 
as the length of the whole 
body, the mantle and the beak. 
These relationships, and the 
variation between observed. 
squid lengths and beak sizes, 
suggest that the animals could 
plausibly be 20 metres long. 

Paxton says that some giant 
squid may grow too large 
to be eaten by some of their 
predators, such as female 
sperm whales. 

J. Zool. http://doi.org/bhwq (2016) 


Genetic clues to 
more rubber 


The genome of the rubber tree 
has revealed a group of genes 
that may drive the plant’s 
unique ability to produce vast 
amounts of rubber. 

Scientists had previously 
released a draft sequence, 
but Chaorong Tang at the 
Chinese Academy of Tropical 
Agricultural Sciences in 
Danzhou and his colleagues 
now report a more complete 
genome sequence for the plant 
(Hevea brasiliensis; pictured). 
Four members of the 
REF/SRPP gene family, which 
are thought to be involved in 
rubber synthesis, were among 
the most highly expressed 
genes in latex, the white fluid 
from which natural rubber is 
obtained. The researchers also 
identified more than 500 genes 
that respond to ethylene, a 
plant hormone known to 
stimulate rubber production. 

These findings could help 
to guide efforts to breed 
higher-yield versions of the 


commercially important plant, 
the authors say. 

Nature Plants http://doi.org/ 
bhwn (2016) 


MATERIALS 


Light heals defects 
in solar-cell film 


Intense light shining on a 
material used in experimental 
solar cells can improve its 
performance. 

Perovskite films promise 
to increase the efficiency of 
solar cells, but imperfections 
in the material, called traps, 
limit further gains. A team 
led by Samuel Stranks of the 
Massachusetts Institute of 
Technology in Cambridge 
found that intense light 
reduces the density of the 
traps by tenfold, boosting 
performance. Chemical 
imaging revealed that iodine 
ions migrated away from 
the illuminated areas, and 
the authors suggest that this 
effectively swept the traps 
away. 

The effect fades over time, 
but the authors hope to 
devise a method with longer- 
lasting effects for commercial 
applications. 

Nature Commun. 7, 11683 (2016) 


> NATURE.COM 

For the latest research published by 
Nature visit: 
www.nature.com/latestresearch 


2 JUNE 2016 | VOL 534 | NATURE | 9 


© 2016 Macmillan Publishers Limited. All rights reserved. 


SEVEN DAYS nescnnss 


ITER improvements 


The nuclear-fusion project 
ITER has improved its 
performance and management, 
and the United States should 
continue to support it at least 
until 2018, the US Department 
of Energy said in a report 
released on 26 May. ITER is 

a collaboration between the 
European Union, China, India, 
Japan, South Korea, Russia and 
the United States. Its goal is 

to show that fusing hydrogen 
nuclei to make helium is 

a feasible way to produce 
electricity. The multibillion- 
euro experiment is under 
construction in southern 
France, but the work is more 
than a decade behind schedule, 
and its costs have spiralled. See 
page 16 for more. 


Science for all 


Ministers from the European 
Union's 28 member states have 
agreed that open access to 
scientific publications should 
become the common standard 
across the bloc by 2020. The 
EU Competitiveness Council, 
which met in Brussels on 
26-27 May, announced the 
target following a public debate 
of broader plans to develop 


NUMBERCRUNCH | 


263,211 


The number of extra deaths 
from cancer during the 
financial crisis of 2008-10 in 
countries that are members 
of the Organisation for 
Economic Co-operation and 
Development. Countries with 
universal health-care systems 
seemed to be protected from 
this impact, according toa 
study in The Lancet. 

Source: M. Maruthappu et a/. Lancet 
http://doi.org/bhzz (2016) 


Better barley boosts Ethiopian brewing 


Two high-yielding varieties of malt barley 
might help Ethiopian smallholders. The 
strains can produce yields of up to 6 tonnes per 
hectare — triple that of the average traditional 
crop (pictured). They were released on 

26 May by the Holetta Agricultural Research 
Center near Addis Ababa, after decades of 


‘open science’. The aim is to 
make research and data more 
freely available to scientists 
and to the wider society. The 
meeting's conclusions held few 
specific details as to how the 
target might be reached, but 
prioritize open access on the 
EU political agenda. 


Chemical reform 
The US House of 
Representatives approved 
a historic bill on 24 May, 
strengthening oversight of 
both new and old chemicals. 
The bipartisan legislation 
would overhaul the 1976 
Toxic Substances Control 
Act — widely considered 
ineffective — and expand 
the US Environmental 
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Protection Agency’s authority 
to ensure that chemicals are 
safe. The bill, which comes 
with endorsements from the 
White House and industry, 
and cautious support from 
many environmental groups, 
is expected to pass the Senate 
soon. See pages 5 and 18 for 
more. 


| FUNDING 
French cuts 


The French government 

has backtracked over part 
ofa plan to cut €256 million 
(US$285 million) from this 
year’s research and higher- 
education budget after 

8 eminent French scientists 
called the plan “scientific and 
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collaboration with the International Center 
for Agricultural Research in the Dry Areas, 
headquartered in Lebanon. Demand for the 
crop — for food and for Ethiopia's burgeoning 
beer industry — is outstripping supply, with 
shortages in 2015 forcing some breweries to 
cut production. 


industrial suicide” In response, 
President Francois Hollande 
promised on 30 May to reduce 
the proposed cuts — 1.1% of 
the total research and higher 
education budget — by 

€134 million. High-profile 
research agencies — the 
Alternative and Atomic Energy 
Commission (CEA), the 
National Centre for Scientific 
Research (CNRS), the National 
Institute for Agricultural 
Research (INRA) and the 
computer-science institute 
Inria — will be spared from 
cuts, he said. 


Telescope record 
The European Southern 
Observatory signed a 
€400-million (US$448-million) 


SIMON DAWSON/BLOOMBERG/GETTY 


SOURCE: ASAPBIO 


contract — the largest ever for 
a ground-based telescope — on 
25 May for construction of 
the dome and structure of the 
European Extremely Large 
Telescope (E-ELT). Building is 
expected to begin in 2017 on 
the 3,000-metre-high Cerro 
Armazones peak in Chile, 

and should be completed in 
2024. With a primary mirror 
39 metres in diameter anda 
footprint the size of a football 
pitch, the E-ELT will be the 
largest optical telescope on 
Earth. Construction for the 
slightly smaller Thirty Meter 
Telescope in Hawaii has been 
stalled by protests by Native 
Hawaiians. 


Iranian physicist 
Omid Kokabee, a laser physicist 
who has been in an Iranian 

jail for more than five years for 
“communicating with a hostile 
government’, has been bailed 
on temporary medical leave 
following surgery for kidney 
cancer, sources tell Nature. The 
33-year-old scientist, who had 
studied at the University of 
Texas at Austin, left a hospital 
in Tehran on 25 May after his 
friends posted bail of 5 billion 
Iranian rials (US$165,000). 
They hope to extend his 

leave using an article of Iran’s 
penal code that permits the 
postponement of a sentence 
that may harm a prisoner's 
health. Kokabee (pictured) 


TREND WATCH 


A survey of almost 


250 biomedical scientists suggests 


that 80% feel that preprint 


servers — on which manuscripts 


are posted before peer review 


and formal publication — should 


not be hosted by for-profit 
organizations. The poll was 
conducted ahead of a 24 May 


workshop convened by ASAPbio 


in Bethesda, Maryland (see 
go.nature.com/wtkqls). The 


group is coordinating efforts to 
implement preprint servers for 
biology, including making plans 


for their governance. 


was arrested in Iran in 2011, 
while visiting family, and was 
sentenced to 10 years in prison 
for alleged espionage — which 
he denies. Numerous appeals 
have been made for his release 
by scientific and human-rights 
organizations. See go.nature. 
com/xp4vza for more. 


Fossil schemes 


Donald Trump, the 
presumptive Republican 
nominee for the forthcoming 
US presidential elections, 
promised on 26 May to 

roll back environmental 
regulations, promote domestic 
fossil-fuel production and pull 
the United States out of the 
2015 Paris climate agreement 
if elected. Speaking in North 
Dakota, where the oil boom 
has collapsed owing to low 

oil prices, Trump accused the 
administration of President 
Barack Obama of using 
“totalitarian tactics” and 
implementing “draconian 
climate rules” to halt the use of 
fossil fuels. A global-warming 
sceptic, Trump said that his 


PREPRINTS NOT PROFITS 


administration would deal with 
“real environmental challenges, 
not phony ones”. 


Next Science editor 
Jeremy Berg will become 

the editor-in-chief of the 
Science family of journals, 

the American Association for 
the Advancement of Science 
(AAAS) announced on 25 May. 
He will succeed Marcia McNutt 
when she leaves on 1 July to 
start her role as president of 

the US National Academy of 
Sciences. Berg is currently 
associate senior vice-chancellor 
for science strategy and 
planning in the health sciences 
at the University of Pittsburgh, 
Pennsylvania. He is a former 
director of the US National 
Institute of General Medical 
Sciences, and will be the 20th 
holder of the AAAS post. 


Phone doubts 


The preliminary findings ofa 
huge animal study are fuelling 
ambiguity over possible health 
risks from mobile-phone use. 
In partial findings uploaded to 
the bioRxiv preprint website 
on 27 May, researchers with the 
US$25-million US National 
Toxicology Program (NTP) 
report that up to 3% of male 
rats that were exposed to levels 
of radiation higher than most 
phone users would experience 
developed malignant brain 
and heart tumours (M. Wyde 


A survey by ASAPbio suggests that the biology community 
strongly favours the non-profit model for hosting preprints. 


A preprint server should be hosted by a: 


IB For-profit entity 


50 


Number of responses 


Strongly 
disagree 


Disagree 


Neutral 


© Non-profit entity 


Agree Strongly 


agree 


SEVEN DAYS | THIS WEEK | 


5-9 JUNE 
Astrophysicists and 
science historians 
ponder the Science of 
Time — past, present 
and future — ata 
meeting in Cambridge, 
Massachusetts. 
go.nature.com/mhsft6 


6-10 JUNE 

The biennial Conference 
on Mathematical 
Geophysics takes place 
in Paris. The meeting 
focuses on experimental 
investigation as well 

as theoretical and 
modelling work. 
go.nature.com/Iwekvs 


et al. Preprint at bioRxiv 
http://doi.org/bjfm; 2016). 
The NTP plans to release data 
from a similar mouse study 

in 2017. Whether the final 
results of the studies may be 
relevant to humans is unclear. 
Peer-reviewed studies have 
previously found no cancer risk 
associated with mobile-phone 
use in humans. 


Space inflation 


Astronauts on the 
International Space Station 
successfully tested a flexible 
orbital habitat on 28 May. The 
Bigelow Expandable Activity 
Module, which inflates from 
1.7 metres to 4 metres long, 

is meant to provide an extra 
16 cubic metres for living 

and working in deep space. 
The module had some initial 
problems expanding, owing to 
unexpected friction between 
layers of its fabric, but was 
eventually brought to a 
pressure equal with that of the 
rest of the space station. The 
module will remain in orbit for 
two years, and serve asa test 
for possible bigger versions in 
the future. 


> NATURE.COM 
For daily news updates see: 
Www.nature.com/news 
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PUBLISHING Innovative journal 
eLife gets fresh tranche of 
cash p.13 


SPACE ‘Chipsat’ launch tests 
alternative way to explore 
Solar System p.14 | 


= a Why : 


expansion of US chemical 
regulation matters p.18 


CUS 


EPIDEMIOLOGY High-speed 
video shows how far sneezes 
| really spread p.24 


The health of coral reefs is normally assessed by scuba surveys and other close-up views. 


Reefs mapped from above 


Satellites and research aeroplanes could offer a better, broader view of coral health. 


BY ALEXANDRA WITZE 


two decades, but the marine ecologist is 

about to see them in a fresh light. Begin- 
ning on 6 June, Hochberg and his colleagues 
will use a specially outfitted NASA aeroplane 
to map the spectra of sunlight reflecting off 
reefs spread across the Pacific Ocean far below. 
The scientists aim to tease out the spectral 
signatures of coral, algae and sand — and to 
check the health of the reefs. 

The three-year, US$15-million Coral Reef 
Airborne Laboratory (CORAL) project will 
be the biggest and most detailed study yet of 
entire reefs, rather than just the small patches 
that scuba divers can reach. CORAL is part of 
a growing push to map reefs faster, and in more 


E: Hochberg has studied coral reefs for 


detail, than ever before. Marine scientists are 
putting new instruments onto planes, satellites 
and even drones to gain a broader perspective 
on how well corals are doing — or not. 

After its surveys in Hawaii, Australia’s Great 
Barrier Reef, the Mariana Islands and Palau 
(see “Under the sea’), CORAL will have mapped 
about 3-4% of the world’s reef area, hundreds of 
times more than previous scuba surveys. 

Warming ocean waters have led to 
massive coral-bleaching events such as the one 
now devastating the Great Barrier Reef. The 
CORAL scientists hope to learn how individ- 
ual reefs respond to such threats. “We want to 
start looking at things at the ecosystem scale, 
which is really hard to do in the water,’ says 
Hochberg, at the Bermuda Institute of Ocean 
Sciences in St George's. 
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Remote sensing of coral reefs is hard because 
the oceans reflect so much less light than the 
land, says Heidi Dierssen, a marine ecologist 
at the University of Connecticut Avery Point 
in Groton, who is part of the CORAL team. 
And scientists have to do elaborate calcula- 
tions to correct for the distortion of light on its 
journey through the atmosphere and through 
water — a bright, deep ocean bottom and a 
dark, shallow bottom can both look the same 
to a remote-sensing camera. 

Teasing out such distinctions requires scan- 
ning an area using as many wavelength bands 
as possible. “When you have the full spectrum, 
you can say so much more about what is there,” 
Dierssen says. 

One of the latest views from above comes 
from the Sentinel-2 satellite, launched by the > 
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> European Space Agency in June 
2015. Although the satellite was not 
designed to study reefs, it has rela- 
tively sharp vision and can operate 
over more and narrower spectral 
bands than the US Geological Sur- 
vey’s Landsat-8 satellite, another 
workhorse of Earth observing. And 


UNDER THE SEA 


Over the next three years, a NASA research aeroplane will survey coral reefs 
throughout the Pacific Ocean — including the rich ecosystems of the Great 
Barrier Reef in Australia. 


increase in pollution, might affect 
coral’s health. 

The June flights in Hawaii will 
test whether all the equipment is 
working. From there, the Gulf- 


stream IV plane will go to the 
Great Barrier Reef in September 
and October, followed by Hawaii, 


unlike data from keen-eyed com- 
mercial satellites, Sentinel’s obser- 
vations are free to use. 

Sentinel-2 will also eventually 


Pacific | 
Ocean 


the Mariana Islands and Palau in 
2017. Divers will simultaneously 
measure the optical properties of 
the surrounding seawater and the 


revisit the same spot every 5 days, 
compared with Landsat-8'’s 16-day 
return period. That makes it a bet- 
ter choice for studying short-term 
marine phenomena such as coral 
bleaching and algal blooms, says 
John Hedley, a remote-sensing 
expert at Environmental Computer 
Science in Tiverton, UK, who is on 
the science team for the Sentinel-2 
coral study, Sen2Coral. 

Team members are set to report early results 
on mapping reef bottoms at a coral-reef sym- 
posium in Honolulu, Hawaii, on 22 June. 

But in the wavelength range applicable to 
underwater sensing — 430-710 nanometres — 
Sentinel-2 cannot capture details that CORALs 
plane can. The plane carries an instrument that 
gathers data in more than 100 narrow spectral 
bands in that range, including the signature 


15°S Ye 


reef condition up close, to cross- 
check what the plane sees from 


iCresD All reefs 


Barrier |S 


@ Validation sites 


© CORAL survey regions 


8,500 metres above. 
The flights will provide a 
snapshot of some of the world’s 


ey 


most important reefs, says Serge 
Andréfouét, a marine ecologist at 


180°E 165°W 


135°E 


1505 165°E 


of photosynthetic organisms within the living 
coral itself at 570-575 nanometres. 

CORAL will focus on one simple 
metric: how much coral cover there is ona given 
reef, as opposed to algae and sand. From that, 
researchers can calculate how well the coral is 
doing at transforming sunlight into energy to 
maintain a reef structure. Hochberg and his 
colleagues hope to use that information to bet- 
ter understand how local changes, such as an 


the Research Institute for Devel- 
opment (IRD) in Nouméa, New 
Caledonia, who led an earlier coral- 
mapping effort with the Landsat-7 satellite. 

But CORAL will be a one-time glimpse only. 
With limited funding, there are no plans to 
repeat any flights to see how the reefs change 
over time, Hochberg says. 

Instead, the team hopes to provide a rich set 
of baseline data for future coral studies. “You 
have to pick and choose where you go to try 
to understand how the ecosystem is working,” 
he says. = 


SOURCE: ERIC HOCHBERG 


Biology ’s big funders boost eLife 


Open-access journal nets £25 million in support until 2022. 


BY EWEN CALLAWAY 


hen three of the world’s biggest 
private biomedical funders 
launched the journal eLife in 2012, 


they wanted to shake up the way in which 
scientists published their top papers. The new 
journal would be unashamedly elitist, com- 
peting with biology’s traditional ‘big three, 
Nature, Science and Cell, to publish the best 
work. But unlike these, eLife would use work- 
ing scientists as editors, and it would be open 
access. And with backers providing £18 mil- 
lion (US$26 million) over five years, authors 
wouldn't need to pay anything to publish there. 

Four years and more than 1,800 publications 
later, eLife’s funders — the Howard Hughes 
Medical Institute in Chevy Chase, Maryland, 
the Wellcome Trust in London and the Max 
Planck Society in Berlin — announced on 
1 June that they will continue their support. 
They will back the non-profit eLife organization 
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with a further £25 million between 2017 and 
2022 (see ‘eLife by the numbers’). 

“eLife’s status in the field is rising quite 
quickly,” says Sjors Scheres, a structural biolo- 
gist at the Laboratory of Molecular Biology in 
Cambridge, UK. He became an editor at the 
journal in 2014, overseeing papers on electron 
microscopy. “I liked the idea behind it — to 
make a high-impact journal completely driven 
by scientists, and open,’ he says. Although sci- 
entists like publishing in the journal, it’s less 
clear whether it has catalysed a wider transfor- 
mation at the elite end of science publishing. 


COLLABORATIVE ATTRACTION 

The journal's most innovative feature, according 
to its authors and reviewers, is its collaborative 
peer-review process. It turns conventional peer 
review — in which referees submit individual, 
and sometimes contradictory, reports — on its 
head. Instead, referees and scientist-editors 
work together to identify a submitted paper’s 
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strengths and weaknesses and any needed 
revisions. Authors receive one decision letter, 
not individual reports from each referee. 

That makes for a speedy review: last year, 
eLife’s published papers took, on average, 
116 days from submission to acceptance. For 
comparison, Nature and Cell take around 
150 days, although Science says that in 2013 it 
took 99 days from submission to acceptance. 
Cell and two of its sister journals have experi- 
mented with a similar peer-review model but 
none has yet adopted it. Peter Binfield, the 
publisher of another open-access journal, 
Peer], in San Francisco, California, says that he 
likes eLife’s peer-review system, but he thinks 
that the approach would be impossible to scale 
up to adopt for all published articles. 


SELECTIVE BUT OPEN 

As it bids to become a top journal, eLife has 
started to turn down more of its submissions. 
The journal’s acceptance rate dropped from 


BEN BISHOP 


26% in 2014 to 15.4% by 2015, says its edi- 
tor-in-chief Randy Schekman, a cell biolo- 
gist at the University of California, Berkeley. 
That’s approaching the acceptance rates of 
Nature and Science, which are both below 
10%. 

In 2013, Schekman denounced Nature, 
Science and Cell as “luxury journals’, and 
likened their low acceptance rates and 
high impact factors to high-end “fashion 
designers” that artificially stoke demand for 
their brand through scarcity. Now, he says, 
eLife has become “more selective than I had 
imagined, but it’s not based on any instruc- 
tions I have conveyed to the editors. It’s based 
on their sensibility of important work” 

In 2014, the most recent year for which 
financial information is publicly available, 
eLife published 537 research articles with 
expenses of £3.4 million — equating to 
around £6,300 for each article. “It appears 
to be a very expensive way to innovate in 
the publishing space,’ says Binfield. 

The journal says that its per-article cost 
has dropped — to £3,522 in 2015. It points 
out that it spends money on technology 
development, too. Six publishers that use the 
third-party publishing platform HighWire 
have tested the eLife-developed Lens dis- 
play technology, for instance. Schekman 
says that eLife plans to diversify its income 
by asking governments and other charities 
for funding. It will also eventually charge 
scientists to publish in the journal. But it 
wont, he says, establish other open-access 
journals that accept more papers and have 
lower selectivity — a strategy that some 
have used to shore up finances. “We have 
no interest in creating other lesser journals 
with lower standards,” he says. = 


£43 million 


Amount committed to the journal over 
ten years (2012-22) by the Wellcome 
Trust, Howard Hughes Medical Institute 
and Max Planck Society. 


848 


Research articles published in 2015 — 
all open access. 


116 days 


Median time to acceptance of paper, 
2015. 


13.4% 


Acceptance rate in 2015. 
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A KickSat satellite (artist’s impression) will launch several raineeule chipsats. 


First flight for 
tiny satellites 


Launch of ‘chipsat’ probes in July will test anew way 
to explore the Solar System — and beyond. 


BY NICOLA JONES 


n 6 July, if all goes to plan, a pack of 
() about 100 sticky-note-sized ‘chipsats’ 
will be launched up to the Interna- 
tional Space Station for a landmark deploy- 
ment. During a brief few days of testing, the 
minuscule satellites will transmit data on their 
energy load and orientation before they drift 
out of orbit and burn up in Earth’s atmosphere. 
The chipsats, flat squares that measure just 
3.2 centimetres to a side and weigh about 
5 grams apiece, were designed for a PhD pro- 
ject. Yet their upcoming test in space is a baby 
step for the much-publicized Breakthrough 
Starshot mission, an effort led by billionaire 
Yuri Milner to send tiny probes on an inter- 
stellar voyage. 

“We're extremely excited,” says Brett 
Streetman, an aerospace engineer at the non- 
profit Charles Stark Draper Laboratory in 
Cambridge, Massachusetts, who has inves- 
tigated the feasibility of sending chipsats to 
Jupiter’s moon Europa. “This will give flight 
heritage to the chipsat platform and prove 
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to people that they’re a real thing with real 
potential” 

The probes are the most diminutive 
members of a growing family of small satel- 
lites. Since 2003, researchers have launched 
hundreds of 10-centimetre-sided CubeSats — 
more than 120 last year alone. Engineer Jekan 
Thanga at Arizona State University in Tempe 
is now working on an even smaller ‘femto- 
satellite, a 3-centimetre cube that he says has 
the technological capacity of the first CubeSats. 
Chipsats, which are smaller and cheaper still, 
are seen as disposable sensors that could be 
sent on suicide missions to explore hostile 
environments, such as Saturn’s rings. 

“They're all part of the toolbox for next- 
generation space missions,’ says Thanga. 

The upcoming chipsat test, called KickSat-2, 
is the second incarnation of a crowdfunded 
mission developed by researchers at Cornell 
University in Ithaca, New York. The shoebox- 
sized KickSat-1 spacecraft successfully launched 
on 18 April 2014, but it failed to deploy its cargo 
of 104 chipsats after a cosmic radiation burst 
reset the clock on its release mechanism. The > 
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> craft fell out of orbit and burned up with 
the chipsats still in its hold. 

“Twas a little bummed out,’ says Zachary 
Manchester, an aerospace engineer who 
built the satellites as a doctoral student 
in aerospace engineering at Cornell. 
Fortunately, enough spare parts were lying 
around to make a second batch relatively 
quickly and easily. 

The chipsats, called Sprites, carry little 
more than a pair of 60-milliamp solar cells, 
a radio and an antenna. The KickSat-2 
payload includes some newer Sprites that 
can ‘sail’ by tilting towards or away from the 
Sun. A current is run through a coil, turning 
the chip into a compass needle that aligns 
with Earth’s magnetic field, allowing the 
chipsat to control its orientation. The probes 
can be reprogrammed on the fly from the 
space station. 

Sprite prototypes have already proved 
that they can survive the rigours of space. 
In 2011, three chipsats were attached to the 
outside of the space station. They were still 
working when scientists retrieved them in 
2014. 

That commercial electronics are good 
enough to survive space’s vacuum and 
extreme temperatures is a “pretty big deal’, 
says Mason Peck, an aerospace engineer 
who leads Cornell’s chipsat team. But on 
a flight into deep space, chipsat electron- 
ics would face a high risk of damage from 
radiation. “There are some clear paths to 
radiation hardening, but it’s expensive,” says 
Peck. “And that’s not the point. You don’t 
want to make an exquisite satellite. You just 
launch a million; if only 1% survive then 
that’s fine. You put statistics on your side.” 

There is plenty of science that Sprites can 
do closer to home. Peck says that the tiny 
satellites could be used to verify models 
of how small bits of debris behave in the 
upper atmosphere. Like feathers on Earth, 
the small, flat objects would be heavily 
affected by drag. “We're not very good at 
modelling that,’ says Peck. Another poten- 
tial project would be to use Sprites to make 
a high-spatial-resolution map of Earth's 
magnetic field. 

“That would be really useful,” agrees 
Jeffrey Love, a geophysicist with the US 
Geological Survey in Denver, Colorado, 
who studies Earth’s magnetism. “Ideally 
youd want to be measuring it everywhere 
all the time. This could be a step in that 
direction” 

For the long-term interstellar goal, 
chipsats will need much better laser- 
communication capacity. That should be 
possible, say Peck and Manchester, who 
are both on the Breakthrough Starshot 
advisory committee. 

“We have gone a long way towards prov- 
ing we can have a functional tiny craft,” says 
Peck. = 
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The gigantic ITER project is currently under construction in southern France. 


NUCLEAR PHYSICS 


US urged to stay 
in fusion project 


Department of Energy says US should fund ITER until 2018. 


BY DAVIDE CASTELVECCHI 
& JEFF TOLLEFSON 


he troubled nuclear-fusion experiment 
| ITER has received a cautious vote of 
confidence from the US Department of 
Energy (DOE). The multibillion-euro project 
has improved its performance and manage- 
ment, and the United States should continue 
to support it, at least until 2018, the DOE said 
ina report to Congress released on 26 May. But 
after that, the agency said, the country should 
re-evaluate its position. 

ITER is a collaboration between the Euro- 
pean Union, China, India, Japan, South Korea, 
Russia and the United States. Its goal is to show 
that fusing hydrogen nuclei to make helium 
— the same process that heats up the Sun and 
powers hydrogen bombs — isa technologically 
feasible way to produce electricity. 

The reactor is under construction in 
St-Paul-lez-Durance in southern France, but 
the work is more than a decade behind sched- 
ule, and its costs have spiralled. The latest 
report comes against a backdrop of criticism 
directed at ITER’s former management. 

The DOE acknowledges ITER’s scientific 
potential, and the substantial improvements 
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since current director-general Bernard Bigot 
took over in March 2015. “ITER remains the 
best candidate today to demonstrate sustained 
burning plasma, which is a necessary precur- 
sor to demonstrating fusion energy power,’ US 
energy secretary Ernest Moniz writes in the 
report's introduction. But the agency says that 
the progress “must be balanced against several 
years of inadequate performance” Its recom- 
mendation to continue US funding for ITER 
is contingent on continued and sustained pro- 
gress on the project, increased transparency 
and a suite of management reforms. 

“T think it’s an outstanding report that says 
all of the right things,” says William Madia, a 
former director of Oak Ridge National Labo- 
ratory in Tennessee who led an independent 
review of ITER in 2013. That report excoriated 
the way in which ITER was run, and proposed 
reforms to save it from failure — recommenda- 
tions that ITER’s governing council embraced. 

Madia says that the DOE is appropriately 
encouraged by recent management changes, 
and appropriately cautious about whether the 
project is actually back on track. “Bernard is 
doing a terrific job, but, my goodness, he’s got a 
lot of work to do,’ he says. Bigot acknowledges 
this, and says that the DOE’s conclusions are 
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the most he could have hoped for at this point: 
“We know there is still a long way to go.” 

The DOE is a major funder of fusion 
research. But although the United States is 
bound by an international treaty to provide its 
share of ITER’s costs — a relatively small 9% of 
the project’s budget — it cannot meet its con- 
tributions if Congress does not approve them. 


GROWING BUDGET 

The report's recommendations have provoked 
scepticism on Capitol Hill. Senator Dianne 
Feinstein of California, the highest-ranking 
Democrat on the Senate panel that oversees 
DOE spending, says that the United States 
cannot afford to keep pace with ITER’s growing 
budget. The DOE estimates that the country’s 
annual contribution, currently US$115 million, 
will more than double by 2018. 

Last year, the Senate proposed to end 
support for ITER, but backed down during 
final negotiations with the House of Repre- 
sentatives. This year, it is not clear that ITER 
will win a reprieve. On 12 May, the Senate 
approved an energy-funding bill for fiscal year 
2017 that cut all spending on ITER. And on 26 
May, the House rejected its own 2017 energy- 
spending bill, which included money for ITER. 

Without the United States, ITER would 
probably survive, says Mark Koepke, a plasma 
physicist at West Virginia University in 


Morgantown who leads a government advisory 
panel on fusion research. But in April, Bigot 
told US lawmakers that the country’s fusion 
expertise would be difficult to replace. Madia 
says that the effect of a US exit is impossible 
to predict: “It makes good cocktail conversa- 
tion, but no one knows what would actually 


- : happen.” 

ITER remains ITER’s approach to 
the bes t fusion is to trap heavy 
candidate today isotopes of hydrogen 
to demonstrate in a doughnut- 
sustained shaped vacuum ves- 
burning sel called a tokamak 
plasma.” and heat them to 


150 million °C. This 
should force their nuclei to fuse, releasing vast 
amounts of energy. Other tokamaks exist, but 
ITER would be the first to release substantially 
more energy than was put into the hydrogen 
plasma. 

Begun in 2007, the project was originally 
due to be completed in 10 years for €5 billion 
(US$5.6 billion). Observers say that under pre- 
vious director-general Osamu Motojima, who 
was in office from 2010 to 2015, the experi- 
ment was in denial about slipping deadlines 
and witnessed a drop in staff morale. After 
the independent review by Madia, the ITER 
Council accelerated the transition to a new 
director-general, nominating Bigot, a French 
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nuclear physicist with extensive management 
experience, in late 2014. 

By November 2015, Bigot’s team had 
presented a revised timetable for the project, 
and estimated that it would cost an extra 
€4.6 billion to bring to completion. The team 
said that the earliest possible date for getting 
hydrogen plasma to run inside the machine 
was 2025, and that it would take several more 
years to inject the heavier hydrogen isotopes 
tritium and deuterium, and achieve fusion. 

In April, an external review from the ITER 
Council Working Group confirmed that pro- 
gress had been made on the recommendations 
of the Madia report, and that the new manage- 
ment had been realistic about the earliest pos- 
sible date for plasma. But it pointed out that the 
estimates of costs and the completion date did 
not take into account possible contingencies. 

The latest DOE report recommends funding 
the cost increases cited by Bigot, but remains 
sceptical about the schedule. It outlines two 
funding scenarios: one based on achieving first 
plasma in 2025, and a more realistic scenario 
that pushes the date back to 2028. 

Bigot’s team also proposed a more modest 
plan, which achieves first plasma on time but 
delays fusion. This should save money by post- 
poning the parts of construction that are not 
needed for first plasma, but no one has yet cal- 
culated how much. = 
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MATHEMATICS 


Maths proof smashes size record 


Supercomputer produces a 200-terabyte proof — but is it really mathematics? 


BY EVELYN LAMB 


hree computer scientists have announced 
Tits largest-ever mathematical proof: a 

file that comes in at a whopping 200 ter- 
abytes, equivalent to all the digitized text held 
by the US Library of Congress. The researchers 
have created’ a 68-gigabyte compressed version 
of their solution — which would allow anyone 
with about 30,000 hours of spare processor time 
to download, reconstruct and verify it — but 
a human could never hope to read through it. 


> 


MORE 
ONLINE 


Computer-assisted proofs too large to be 
directly verifiable by humans have become 
common, as have computers that solve prob- 
lems in combinatorics — the study of finite 
discrete structures — by checking through 
umpteen individual cases. Still, “200 terabytes 
is unbelievable’, says Ronald Graham, a math- 
ematician at the University of California, San 
Diego. The previous record-holder is thought 
to bea 13-gigabyte proof’, published in 2014. 

The puzzle that required the 200-terabyte 
proof, called the Boolean Pythagorean triples 


problem, has troubled mathematicians for dec- 
ades. In the 1980s, Graham offered a prize of 
US$100 for anyone who could solve it. (He pre- 
sented the cheque to one of the three computer 
scientists, Marijn Heule of the University of 
Texas at Austin, last month.) The problem asks 
whether it is possible to colour each positive 
integer either red or blue, so that no trio of inte- 
gers a, band c that satisfy Pythagoras’ famous 
equation a’ + b’=c’ are all the same colour. For 
example, for the Pythagorean triple 3, 4. and 5, 
if3 and 5 were blue, 4 would havetobered. > 
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Ina paper' posted on the arXiv server on 
3 May, Heule, Oliver Kullmann of Swansea 
University, UK, and Victor Marek of the Uni- 
versity of Kentucky in Lexington show that 
there are many allowable ways to colour the 
integers up to 7,824 — but when you reach 
7,825 or above, it is impossible for every 
Pythagorean triple to be multicoloured. 
There are more than 10°°” ways to colour 
the integers up to 7,825, but the researchers 
took advantage of symmetries and several 
techniques from number theory to reduce 
the number of possibilities that the computer 
had to check to just under 1 trillion. It took 
about 2 days running 800 processors in par- 
allel on the University of Texas’s Stampede 
supercomputer to zip through all the pos- 
sibilities. The researchers then verified the 
proof using another computer program. 

The Pythagorean triples problem is one of 
many similar questions in Ramsey theory, 
an area of mathematics that is concerned 
with finding structures that must appear 
in sufficiently large sets. For example, the 
researchers think that if the problem had 
allowed three colours, rather than two, they 
would still have hit a point where it would 
have been impossible to avoid creating a 
Pythagorean triple where a, b and c were 
all the same colour; indeed, they conjecture 
that this is the case for any finite choice of 
colours. Any proof for more colours will 
probably be even larger than the 200-tera- 
byte 2-colour proof, unless researchers can 
simplify the case-by-case checking process 
with a breakthrough in understanding. 

Although the computer solution has 
cracked the Boolean Pythagorean triples 
problem, it hasn't provided an underlying 
reason why the colouring is impossible, 
or explored whether the number 7,825 is 
meaningful, says Kullmann. That echoes 
a common philosophical objection to the 
value of computer-assisted proofs: they may 
be correct, but are they really mathematics? 
If mathematicians’ work is considered to be 
a quest to increase human understanding 
of mathematics, rather than to accumulate 
an ever-larger collection of facts, a solution 
that rests on theory seems superior to a 
computer ticking off possibilities. 

In the case of the 13-gigabyte proof” 
from 2014, which solved a special case of 
a question called the Erdés discrepancy 
problem, a theory-based solution was 
eventually found. Mathematician Terence 
Tao of the University of California, Los 
Angeles, solved’ the general problem the 
old-fashioned way in 2015 — a much more 
satisfying resolution. m 


1. Heule, M. J. H., Kullmann, O. & Marek, V. W. 
Preprint at http://arxiv.org/abs/1605.00723 
(2016). 

2. Koney, B. & Lisitsa, A. Preprint at http://arxiv. 
org/abs/1402.2184 (2014). 

3. Tao, T. Preprint at http://arxiv.org/ 
abs/1509.05363 (2015). 
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Napthalene is one of the chemicals slated for review by the US Environmental Protection Agency. 


US chemicals law 
set for overhaul 


Bill would give government more authority to regulate 


potentially toxic substances. 


BY JEFF TOLLEFSON 


he US Congress is poised to overhaul 
Tite law that governs the introduction 

and use of chemicals, in one of the most 
significant changes to the country’s environ- 
mental regulation in decades. The update to 
the 1976 Toxic Substances Control Act (TSCA) 
comes after more than ten years of debate, and 
many failed attempts to revamp the law. 

The US House of Representatives passed the 
bill with overwhelming bipartisan support on 
24 May. The Senate is expected to approve the 
measure soon, clearing the way for US Presi- 
dent Barack Obama to sign it into law. 

Nature takes a look at the implications of the 
historic deal, which will give the US Environ- 
mental Protection Agency (EPA) new power 
to ensure that chemicals — both old and new 
— are safe. 


Why amend the current law? 
Critics of the TSCA have long complained that 
the law effectively ties the EPA’s hands, pre- 
venting the agency from examining the safety 
of known chemicals and making it difficult to 
ensure that new ones do not pose undue health 
hazards. 

The law requires companies to register new 
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chemicals before they are used in products and 
industrial processes, but the default assumption 
is that all chemicals are safe. Unless the EPA can 
show that a given chemical poses an unreason- 
able risk to human health or the environment, 
that chemical is automatically approved for use. 
Companies do not have to provide the agency 
with much information about their chemicals, 
and the EPA cannot require industry to con- 
duct additional research without solid evidence 
that a chemical poses a health risk. 


How many chemicals does the EPA regulate? 
Companies introduce about 700 chemicals 
into the marketplace each year. And 40 years 
after the TSCA became law, the EPA’s chemical 
inventory lists 85,000 substances. But nobody 
knows exactly how many of these chemicals 
are still in use. 

The EPA has identified 90 chemicals that 
merit further investigation, and possibly regu- 
lation. But only about 2% of the chemicals in 
use today have undergone a safety review by 
government scientists, according to the Envi- 
ronmental Defense Fund, a watchdog group 
in New York City. 


So, what will change? 
In short, everything. Once the TSCA is 


KEITH WHEELER/SPL 


amended, the EPA will have the authority to 
ask questions, seek more information and even 
require companies to conduct additional stud- 
ies to ensure that chemicals are safe. 

The US law requires less information about 
chemicals up front than Europe’s pioneering 
chemical-safety legislation, but the two regu- 
latory approaches should have similar results, 
says Richard Denison, a biochemist who 
tracks chemical safety for the Environmental 
Defense Fund. 

“The EPA now has to make an affirmative 
finding that a chemical is safe in order for that 
chemical to go on the market,” he says. Moreo- 
ver, Denison notes, the legislation allows the 
EPA to determine the risks posed by a chemi- 
cal without considering the economic implica- 
tions of that decision. 

The revised law will also restrict companies’ 
ability to withhold information about chemi- 
cals from the public by arguing that the data 
are a trade secret. Whereas most claims for 
confidentiality sail through under the current 
law, Denison says that the amended statute will 
require firms to provide detailed explanations 
for why the information they submit to the 
EPA should remain secret. 

This change, along with the EPA’s new ability 
to review more chemicals, will give researchers 
and the public greater access to information 
about chemicals in the environment. 


How did the new bill arise? 

Lawmakers in Congress have debated whether 
— and how — to revise the TSCA since at least 
2005. But repeated efforts to overhaul the law 
failed as politicians debated how to expand the 
EPAs authority to regulate chemicals without 
stifling commercial innovation. 

The chemical industry initially opposed 
efforts to reform the TSCA, but gradually 
changed its position as state and interna- 
tional chemical regulation expanded. When 
a Republican proposal to amend the TSCA 
gained momentum in 2013, Democrats began 
to join the effort and a compromise slowly 
emerged. 

The House passed its own TSCA reform bill 
in June, and the Senate approved its bill leg- 
islation in December. For months, lawmak- 
ers have been hammering out a compromise 
measure that blends aspects of the House bill 
with the major components of the Senate leg- 
islation. 

The result is a deal with broad bipartisan 
backing in Congress, and endorsements from 
the White House and industry. Many envi- 
ronmental groups have expressed cautious 
support for the legislation, but remain con- 
cerned about its recommended funding lev- 
els, the continuing requirement to consider 
costs when developing regulations, and provi- 
sions that could allow the federal government 
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to override state regulation of chemicals. 


What comes next? 
After the bill is enacted, the EPA will draw 
up rules for its new review process, which 
includes determining the fees that compa- 
nies will need to pay to submit chemicals for 
government review. The legislation allows the 
agency to collect up to US$25 million per year 
in fees to supplement its budget for chemical 
regulation, which is intended to cover roughly 
one-quarter of the total programme cost. 
The EPA must also figure out which of the 
85,000-odd chemicals in its inventory are still 
in use. The agency will survey companies that 
make and use chemicals to revise its list. Once 
that’s done, agency scientists can go through 
the list and prioritize those chemicals that 
merit a safety review. m 


CORRECTION 

The figure given for the planting of super 
soya bean in the News Feature ‘Frugal 
farming’ (Nature 533, 308-310; 2016) 
should have been 67,000 hectares, not 
1 million. In addition, the feature failed 
to make it clear that Jonathan Lynch was 
joking when he suggested that students 
should “drop acid”. 
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South Korea’s 
Nobel dream 


The Asian nation spends more of its economic 
output on research than anywhere else in 

the world. But it will need more than 

cash to realize its ambitions. 


BY MARK ZASTROW 
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in Daejeon, South Korea, a major 

experiment is slowly taking shape. 
Much of the first-floor lab space is under 
construction, and one glass door, taped shut, 
leads directly to a pit in the ground. But at the 
end of the hall, in a pristine lab, sits a gleam- 
ing cylindrical apparatus of copper and gold. 
It's a prototype of a device that might one day 
answer a major mystery about the Universe 
by detecting a particle called the axion — a 
possible component of dark matter. 

Ifit succeeds, this apparatus has the poten- 
tial to rewrite physics and win its designers a 
Nobel prize. “It will transform Korea, there’s 
no question about it,’ says physicist Yannis 
Semertzidis, who leads the US$7.6-million- 
per-year centre at South Korea’s premier 
technical university, KAIST. But there's a catch: 
no one knows whether axions even exist. It’s 
the kind of high-risk, high-reward project 


B ehind the doors of a drab brick building 


SHIN WOONG-JAE 


A prototype axion that symbolizes the 
detectorin Daejeon, | country’s ambition to 
South Korea. become a world leader 


in basic research. 

South Korea is spending heavily to achieve 
its goal. In 1999, the country’s investment in 
research and development (R&D) totalled 
2.07% of its gross domestic product (GDP), just 
below the average for nations in the Organisa- 
tion for Economic Co-operation and Develop- 
ment (OECD). In the latest figures, the country 
has stretched out a clear lead at the top. The 
4.29% (63.7 trillion won, or US$60.5 billion) 
that South Korea invested in R&D in 2014 
outstrips runner-up Israel (at 4.11%), as well 
as regional competitor Japan and the United 
States. The biggest chunk of the money goes 
towards applied research and development in 
industry, but the government has made major 
investments in basic science, too. 

The big hope is that the country can inno- 
vate its way out of a looming economic crisis 
— and wina Nobel prize in the process. South 
Korea aims to increase its investment to 5% 
of GDP by 2017, and last month, President 
Park Geun-hye's government announced that 
it would boost annual basic-science funding 
levels by 36% by 2018, to 1.5 trillion won. 
“Basic research starts with intellectual curi- 
osity among scientists and technicians, but 
it could be a source of new technologies and 
industries,” Park said. 

Can the country achieve its ambition? That 
depends who you ask. Some Korean scientists 
and policymakers doubt that it can sustain its 
high level of investment, and they worry that 
cultural barriers and bureaucracy are hinder- 
ing research. Young scientists are voting with 
their feet: according to figures released in 
2014 by the US National Science Foundation 
(NSF), nearly 70% of South Koreans who were 
awarded PhDs in the United States in 2008-11 
planned to stay there. 

Reorienting the nation’s science focus is no 
easy task, says Youngah Park, president of the 
Korea Institute of S&T Evaluation and Plan- 
ning (KISTEP), a government think tank in 
Seoul. The country has long been an industry- 
focused ‘fast follower’ — excelling at quickly 
adopting technologies and products, such as 
semiconductors and smartphones, and mak- 
ing them better and cheaper. Now, Korea needs 
anew model, she says. “That is a very challeng- 
ing and adventurous scheme for us.” 


SHOCK AND AWE 
When the artificial-intelligence (AI) program 
AlphaGo beat Korean grandmaster Lee Sedol 
at the game Go this March, the impact on the 
national psyche was profound. The AlphaGo 
shock, as it came to be known, showed the coun- 
try that AI was the future: Korea must catch up 
to the likes of Google Deep Mind in London, 
which invented the Go-playing machine. 
Within days, President Park announced that 
the government would invest 1 trillion won in 


AI by 2020, and prod the private sector into 
investing a further 2.5 trillion won. The initia- 
tive’s cornerstone would be a public-private 
research institute involving corporations such 
as Samsung and LG. But many scientists criti- 
cized the approach as a knee-jerk reaction that 
would funnel government money into prod- 
uct development, not into the type of basic 
research that the country needs. 

The funding injection was typical of the 
strategy that has propelled South Korea's 
economy over the past few decades: the gov- 
ernment set goals and then channelled money 
to corporate partners to carry them out. The 
formula was devised by Park's father, dictator 
Park Chung-hee, who seized power in a 1961 


“We have large 
funding and you 
can do what 
ever you want to.” 


coup. During his 18-year reign, he favoured 
companies that grew into behemoths — con- 
glomerates, called chaebol in Korea, such as 
Samsung, LG and Hyundai, which remain the 
backbone of the nation’s economy today . 

Powered by these industries, five decades of 
economic growth vaulted South Korea from 
developing-world poverty to membership 
of the group of 20 (G20) leading industrial 
nations. As the country moved painfully from 
dictatorship to democracy, government sup- 
port for research remained a bipartisan priority 
— mainly as a driver for further growth. Korea's 
corporate giants still dominate the R&D scene. 
According to KISTEP figures, of the 63.7 billion 
won spent on R&D in 2014, 49.2 billion came 
from private enterprises. That includes more 
than half of the 11.2 billion won spent on basic 
research. Much industrial research happens 
behind closed doors, although partnerships 
with academia are on the rise. 

Meanwhile, government-funded labs also 
worked mainly towards developing industrial 
technologies, and blue-sky, basic research 
remained an afterthought. “Politicians don't 
distinguish between R&D in technology 
and support in basic science,’ says physicist 
Doochul Kim. Until recently, he says, “there 
has been no support in basic science, basically”. 

Change arrived during the run-up to the 
2007 presidential election, when a group of 
researchers pitched an idea to the nation’s 
leading politicians: that the country build an 
Institute for Basic Science (IBS). The organi- 
zation would be Korea’s answer to Germany's 
academically elite Max Planck institutes and 
Japan’s RIKEN centres. “It was the first time 
that scientists went forward and suggested their 
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own big project for the nation,” says Youngah 
Park, who was a legislator with the conserva- 
tive party at the time. The institutes would be 
part of an even bigger plan to create a research 
and business megahub called the International 
Science and Business Belt — and this became 
government policy when conservative candi- 
date Lee Myung-bak won the election. 

Political wrangling subsequently forced the 
government to scale back some of its plans, but 
IBS survived, in modified form. Fifty IBS cen- 
tres, one-third of them in Daejeon, would be 
funded at an average of 10 billion won a year 
each for at least 10 years — a boon for research- 
ers, who would be offered secure support to 
pursue their ideas. “We have large funding 
and you can do whatever you want to,” says 
Kim, now president of the IBS. Today, 26 of 
the centres have opened, with the rest hoped 
to follow by 2021. 


AXION RACE 

The Center for Axion and Precision Physics 
(CAPP) at KAIST is one of them. Semertzidis 
became head of the centre in 2013, moving 
from Brookhaven National Laboratory (BNL) 
in Upton, New York. 

In its quest to find the axion, CAPP is chas- 
ing a high-profile rival in the United States: 
the Axion Dark Matter Experiment (ADMX), 
based at the University of Washington in Seat- 
tle. “Very smart people, absolutely,” Semertz- 
idis says, with a disarming grin. “But we'll win, 
nonetheless — absolutely.” 

If axions are indeed part of dark matter, they 
should be all around us. CAPP’s design — like 
that of the ADMX — uses a cavity that should 
resonate at the axion’s mass, with strong mag- 
nets outside it that cause the particles to con- 
vert into two photons and pop into sight. But 
physicists don’t know what the axion’s mass is, 
so they have to scan for it, tuning the resonant 
frequency of the cavity with rods of copper or 
sapphire. It will take years for a single device to 
cover the whole range of possible frequencies. 

CAPP hasat least a year of development left, 
whereas ADMX is already beginning opera- 
tions, giving it a significant head start. But 
CAPP plans to build not one, but seven cavities 
— all in that hole in the ground, down the hall. 
And it has more powerful magnets, developed 
at BNL. “We'll do it seven times better, because 
of the sheer power of money,” says Semertz- 
idis, who thinks that his team can leapfrog the 
ADMxX within five years. ADMX leader Leslie 
Rosenberg suspects that it could, too. “CAPP 
is by far our most credible competitor,’ he says. 

Whatever the outcome, Rosenberg says that 
CAPP’s progress is a milestone for Korean 
physics. The country’s willingness to spend 
landed foreign talent and technology. “These 
new IBS centres have moved them into the 
top tier,” he says. The other 25 existing cen- 
tres are pushing into fields ranging from gene 
editing to nanomaterials and pure mathemat- 
ics. Roughly one-third of the IBS’s budget 
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Science in South Korea 


Industrial research and development (R&D) has long been a priority 
for South Korea as a driver of economic growth. In the past decade 
or so, more emphasis has been placed on basic research. 


Patents 


eee Mose hs Danone: costes neta i 
South Korea’s spending on R&D has soared to more than 4% of its from industry ed ae tc nee ones 


gross domestic product (GDP) — more than any other country in 


the world and double that of China and the European Union. 
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is devoted to one flagship effort — the Rare 
Isotope Science Project (RISP) in Daejeon, 
which seeks to build a heavy-ion accelerator 
for nuclear science and biomedical research. 

South Korea is also investing in basic- 
research facilities outside of IBS. The 
Pohang Accelerator Laboratory is receiving 
a 400-billion-won upgrade to house an X-ray 
free-electron laser that can image materials 
on nanometre and femtometre scales. And 
in 2014, the nation completed construction 
ona 107-billion-won, state-of-the-art Ant- 
arctic research centre in Terra Nova Bay, 
which quickly became the envy of the polar- 
research community. “It was like a spaceship 
had landed,” said US National Science Foun- 
dation polar-science head Kelly Falkner in 
2014, months after attending the sleek facility's 
opening ceremony. “It’s amazing to see what 
they can do by starting from scratch.” 

Rosenberg, for one, says that Korea is wise to 
invest in the IBS centres. “If they can continue 
to afford it, I think the pay-off is going to be 
enormous.” And if they find the axion? “Oh 
my goodness, well, let’s say it would instantly 
bea Nobel prize” 

And that is something that this country 
wants very much indeed. 


NOBEL DREAMS 

Last October’s Nobel-prize announcements 
triggered a wave of disappointment — again. 
There were no awards for South Korean 
researchers, but scientists in Japan, the 
nation’s most bitter regional rival, collected 
shares in two: Satoshi Omura for developing 


22 | NATURE | VOL 534 | 2 JUNE 2016 


South Korea 


was 


—lsrael 


US$60.5 
BILLION, 
2014 


Experimental 
development 


US$38.4 bn 


a therapy for roundworm, and Takaaki Kajita 
for showing that neutrinos have mass. “Why 
no Korean Nobel laureates?” asked a headline 
in The Korea Times. 

The question came up again at an oversight 
hearing of South Korea’s parliamentary sci- 
ence committee, held that week. One member 
of parliament compared the full list of the two 
countries’ Nobel laureates in science to a dis- 
mal football result: Japan 21, South Korea 0. 
“When will IBS score a goal?” he asked Kim. 


“Oh my goodness, 
well, let’s say it 
would instantly 

be a Nobel prize.” 


In some political quarters, IBS was origi- 
nally hailed as a way to level the Nobel score, 
but Kim has pushed back against that, arguing 
that the “Nobel complex’ leads to shortsighted 
policies that chase hot topics and demand 
instant results. “We are only four years old,” 
he told the committee. He noted that it took 
decades to develop the infrastructure at Japan's 
Kamiokande Observatory near Hida, where 
the neutrino breakthrough was made. “So you 
shouldn't ask that question,” he said. 

Korea did seem poised for a Nobel just 
over a decade ago, when stem-cell scientist 
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Woo Suk Hwang claimed to have derived the 
world’s first stem-cell lines from cloned human 
embryos. But glory quickly turned to shame 
when Hwang was first found guilty of ethics 
violations in the way he collected women’s 
eggs for research, and then discovered to have 
fabricated some of his work. The scandal left 
the impression that the country’s oversight of 
research ethics and integrity was lax. 
Scientists in Korea say that the scandal has 
brought about positive changes. Slowly, more 
Korean journals have begun to issue retrac- 
tions, says Eric di Luccio, a structural biologist 
at Kyungpook National University in Daegu, 
and many universities are using the plagiarism- 
detection site turnitin.com to check papers 
and theses. More attention is also being paid 
to bioethics, says Jin-Soo Kim, director of the 
IBS Center for Genome Engineering at Seoul 
National University. “Before the Hwang scan- 
dal, in the laboratory, people would just draw 
blood and do experiments,’ he says. “Now it’s 
recognized that you shouldn't do it without 
approval” from an institutional review board. 
But Kim says that one ‘Hwang-gate’ reform 
is now holding Korea back: in the wake of the 
scandal, the government enacted a ban on 
human-embryo research, with only occasional 
exceptions granted for stem-cell studies. Kim 
has been at the forefront of developments in 
CRISPR-Cas9 gene editing, a technique that 
is revolutionizing biomedical research, but he 
has found himself unable to use the technology 
for research in human embryos, even as teams 
in China, the United Kingdom and elsewhere 
forge ahead with such work. “It’s a pity,” says 
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Publications by discipline 


South Korea has one of the world’s 
highest proportions of researchers. 


South Korea has more than doubled its academic 
publication output since 2005, overtaking similarly 


populated Spain — but lagging behind its regional 
rival Japan. Scientists publish most in chemistry, 
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Kim, who has instead focused his efforts on 
engineering pigs and plants. 

Researchers also chafe at other regulations. 
At public universities, tenure and promotion 
decisions are often based in part on evalua- 
tions that count papers by fractional contribu- 
tion: a four-author paper, for example, would 
earn a scientist a small fraction of the credit 
of a single-author one. The system is “rather 
counterproductive’, says di Luccio. It dissuades 
scientists from taking part in the large inter- 
national collaborations of modern big-budget 
science, and encourages them to publish single- 
author papers in less-prestigious national jour- 
nals to juice up their evaluation scores. “I did it 
three times already,’ he says. “This evaluation 
system is the exact opposite of what it should be 
to elevate scientific research.” The government 
says that universities and other organizations 
are free to implement their own standards of 
evaluation and that nationally funded pro- 
grammes use more qualitative measures. (IBS 
insulates researchers from paper counts.) 

Some scientists see deeper problems with 
the academic culture, rooted in Korean society 
at large. Secondary and undergraduate educa- 
tion focus on test-taking and emphasize defer- 
ence to teachers — tendencies that academics 
bemoan as discouraging the creativity and 
debate necessary in a lab. “When new students 
come, they are quiet — that is the Korean cul- 
ture,” says Jin-Soo Kim, who counters this by 
requiring his students to ask questions before 
they can leave group meetings. 

Korean customs were a turn off for Young- 
Im Kim, who was doing a physics postdoc at 


2006 


2008 2010 2012 2014 


the University of Oxford, UK, in 2014, when a 
friend sent her a link to a job posting at CAPP. 
Although she thrilled to the research, she was 
hesitant to return to her home country because 
of the hierarchical nature of Korean culture. 
“The only reason I applied is because of Yannis,” 
she says. “If he were Korean, I wouldn't have” 
She is now a research fellow at CAPP. 

Cultural barriers can have a disproportion- 
ate impact on female scientists. One exam- 
ple, says Young-Im Kim, is Korean drinking 
culture, in which men often stay out late with 
their male co-workers. Important workplace 
decisions are often made at such events, effec- 
tively excluding women. Such problems could 
go some way towards explaining why Korea 
has a wide gender gap in its scientific work- 
force. According to OECD figures, in 2010 
less than 17% of researchers in South Korea 
were women. In Portugal, the OECD leader, 
the fraction is 45.5%. 


STRETCHED RESOURCES 
Policy analysts warn that research spending 
may slow in the future, as Korea faces the likely 
prospect of an economic slowdown and, in the 
long term, a social-welfare net stretched to sup- 
port an ageing population with one of the low- 
est birthrates in the world. And although R&D 
expenditure continues to grow as a percentage 
of GDP (see ‘Science in South Korea), it is actu- 
ally shrinking when viewed as a percentage of 
government spending, says Youngah Park. 
“That is a sign that we have no room to increase 
this government R&D budget anymore.” 
Some critics say that spending is now too 
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focused on IBS: that the bold plan is sucking 
up basic-research funds and reducing the pot 
of money available through other grants, cre- 
ating a situation of haves and have-nots, and 
potentially quenching original — perhaps 
Nobel-worthy — projects. The shrinking 
grant pool is a legitimate issue, acknowledges 
Doochul Kim. But he thinks that the govern- 
ment should address it by shifting funds from 
applied research. It continues to subsidize 
such research when many say that it should 
be focusing on long-term basic research that 
industry wouldn't pursue. 

“There is some excellent science done in 
Korea, but still, in general, the average is not 
as good as the advanced countries like the US, 
UKand Germany,’ says Jinwoo Cheon, director 
of the IBS Center for Nanomedicine at Yonsei 
University in Seoul. To spur investment in basic 
research, he adds, scientists have to convince 
the public and government officials of its intan- 
gible benefits. “Excellence in basic science is not 
easy to have, and it has to be rooted in our soci- 
ety — curiosity-driven research, and knowing 
different ways of thinking” 

Sunchan Jeong, director of RISP, says that if 
there is such a thing as a recipe for winning a 
Nobel prize, then IBS has got it. “Select some 
competitive fields in the world and concentrate 
their investment on it. That’s a good way.’ But 
there are no guarantees, he cautions: “The 
people in Korea should understand that scien- 
tific results are not necessarily repaid by some 
greater prize like the Nobel.” a 


Mark Zastrow is a writer based in Seoul. 
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Mathematician Lydia Bourouiba 
uses high-speed video to break 
down the anatomy of sneezes 
and coughs, and to explore how 
diseases spread. 


After a sneeze, large droplets of 
saliva and mucus shoot out of the 
mouth, but fall relatively quickly. 


%, A turbulent cloud carries 
\ smaller droplets and 
allows them to drift for 
up to 8 metres. 


WHERE SNEEZES GO 


BY CORIE LOK 


o, how do you get your research subjects to sneeze on cue? 

“That's a question I get a lot,” says Lydia Bourouiba with an 
easy smile. The solution turns out to be surprisingly simple: 
just take a small, rod-shaped device, use it to tickle a subject’s 
nostril for a few seconds, and — achoo! 

For Bourouiba, a mathematician and fluid dynamicist, that sneeze 
is the pay-off. She and her team at the Massachusetts Institute of Tech- 
nology (MIT) in Cambridge record the explosive aftermath in gross 
detail using one or sometimes two cameras running at thousands of 
frames per second. Played back in slow motion, the videos reveal a 
violent explosion of saliva and mucus spewing out of the mouth in 
sheets that break up into droplets, all suspended in a turbulent cloud. 

The videos that Bourouiba has recorded in this way allow her to 
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measure everything from the diameter of  Asneeze captured on 
the droplets to their speed — data that help _ high-speed video. 

her to learn more about how these particles 

carry viruses and other pathogens to their next host. She has shown 
that sneeze and cough particles can travel the length of most rooms 
and can even move upwards into ventilation shafts — suggesting that 
microbes in the droplets could potentially spread farther and over 
longer periods of time than current theories suggest. 

Ultimately, says Bourouiba, her goal with this work is to ground 
epidemiology and public health in physics and mathematics. When 
trying to keep diseases from running rampant, she says, “we want to 
be giving recommendations that are based on science that has been 
tested in the lab” In practical terms, such insights could lead to maps 
showing the contamination risks in the vicinity of infected people, pro- 
tective equipment optimized to shield hospital workers from specific 
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kinds of germs, and better predictions of how diseases move through 
a population. 

Bourouiba pursues this goal with the same energy and ambition 
that leads her to fill her leisure time with week-long bike trips, moun- 
tain climbing — she ascended Tanzania's Mount Kilimanjaro in 2011 
— and winter camping at —20°C. Although she is hardly the first 
researcher to use high-speed video to study fluid dynamics, she is the 
first to realize its potential in the respiratory field, says David Ku, a 
biofluid-mechanics researcher at Georgia Institute of Technology in 
Atlanta. Bourouiba’s approach could be transformative in the field, 
says Ron Fouchier, a virologist at the Erasmus University Medical 
Center in Rotterdam, the Netherlands. “This kind of physics is abso- 
lutely needed to understand how transmission works.” 


AFLUID CAREER 

Bourouiba has been a natural explorer as far back as she can remember. 
As achild in France, she immersed herself in books about science and 
nature, including a biography of Albert Einstein. She soon fell in love 
with mathematics and physics, and made them her major subjects when 
she earned her undergraduate degree in France and Montreal, Canada. 

But during her graduate work in fluid mechanics at Montreal's McGill 
University, as she focused on narrower and narrower theoretical ques- 
tions about turbulent flows, Bourouiba began to feel the itch for some- 
thing more. She had spent some of her early years in Algeria during the 
civil war of the 1990s, and vividly remembered the turmoil and misery 
that she had witnessed there. “We know what the worst is, we saw a lot 
of it? she says. “But what can we as a species do to push that boundary 
of what we can achieve, in terms of making the world a better place?” 

In her search for an answer, Bourouiba soon homed in on health 
and epidemiology. This was the mid-2000s, and emerging diseases 
were all over the news. Severe acute respiratory syndrome (SARS) had 
killed nearly 800 people around the world in 2003, polio was making 
a comeback and avian flu was jumping across to humans. Infectious 
diseases seemed to Bourouiba like the perfect way to combine all of 
her interests and expertise. 

She was tentative at first. A career in fluid mechanics promised to 
be secure and certain, whereas a head-first dive into biology seemed 
like a huge risk. But one day, about halfway through her PhD, she was 
mulling this conundrum as she made her way up the wall at a rock- 
climbing gym. “So what?” Bourouiba suddenly said to herself as she 
reached for the next handhold. “You can't make decisions out of fear.” 


VIOLENT EVENTS 
Having come so far, Bourouiba saw her fluid-dynamics PhD to 
completion in 2008. But from there she managed to land a postdoc 
appointment in mathematical epidemiology at York University in 
Toronto, Canada, where she started thinking about sneezes and coughs. 
These ‘violent expiratory events’ (as one of Bourouiba’s papers 
calls them) were assumed to be one of the main ways that respiratory 
diseases spread. But how, exactly? Epidemiological studies estimate 
how a disease is transmitted on the basis of people’s movements and 
activities at the time they got infected. Did they contract the disease 
by direct, person-to-person contact, such as shaking someone's germ- 
covered hand, or from contaminated surfaces such as doorknobs? Was 
it through large droplets that make a short leap from one respiratory 
tract to another, or through smaller aerosol particles that are sus- 
pended in air and can travel farther before being inhaled? Or was the 
route some combination of these modes? 
Such studies have helped researchers to work out that measles is 
typically spread by aerosols and that Ebola is transmitted mainly 
through direct contact with infected bodily flu- 
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through close contact, yet the 2003 outbreak showed at least some 
evidence of airborne transmission’. And some researchers think that 
Ebola viruses might travel through air to some degree’. 

At York, Bourouiba became convinced that these uncertainties 
could be reduced by pinning down some key details about the physics 
of sneezes and coughs that conventional disease-transmission models 
are missing. 

In 2010, a postdoc appointment at MIT gave her a chance to start 
filling those gaps with hard data. Up to that point, she had worked only 
on theory — but now she plunged into experimental research, learning 
through trial and error the subtleties of high-speed video and lighting 
to capture a sneeze. “Mathematicians are often uncomfortable in a 
lab setting,” says John Bush, a fluid dynamicist who was her mentor at 
MIT. “Lydia really took to it” 


SUSPENDED SPRAY 

One thing that Bourouiba particularly wanted to pin down was the 
size distribution of the droplets coming out of the mouth, because 
size affects how many microbes a droplet can carry and how far it can 
travel through the air. 

For her first set of experiments, published in 2014, she wanted to 
lookat the entire spray of droplets’. Bourouiba posted adverts around 
the MIT campus to recruit volunteers, and filmed the coughs and 
sneezes of about ten healthy people. After much tinkering with cam- 
era positions, backgrounds and lighting levels — at one point, the 
lights made the room uncomfortably hot for participants — Bourouiba 
captured videos that showed that the droplets were propelled out of 
the mouth in a turbulent, buoyant cloud. The cloud grew and slowed 
down as it pulled in air from the environment, lifting and carrying the 
droplets away from the sneezer. 

The video evidence contradicted conventional thinking about 
sneezes, which held that larger droplets would fall to the ground within 
1-2 metres, and that only the smaller ones would stay aloft as airborne 
aerosols. Feeding her video evidence into her mathematical models, 
Bourouiba concluded that, thanks to the cloud dynamics, many of 
the larger droplets can travel up to 8 metres for a sneeze and 6 metres 
for a cough, depending on the environmental conditions, and stay 
suspended for up to 10 minutes — far enough and long enough to 
reach someone at the other end of a large room, not to mention the 
ceiling ventilation system. 

That conclusion has implications for health-care workers, says 
James Hughes, an infectious-disease epidemiologist at Emory Uni- 
versity in Atlanta. If a disease is thought to be transmitted within 
1-2 metres, workers might assume that they are safe beyond that zone. 
“I think maybe we need to be a little bit more circumspect about that,’ 
he says. 

For Bourouiba’s next set of experiments’, she zoomed in closer to 
the mouth to film a 150-millisecond-long sneeze. Videos taken from 
the side and top at up to 8,000 frames per second revealed that the 
fluid breaks up in steps, like a slow-motion explosion produced by 
Hollywood: the fluid emerges from the mouth in sheets, which are 
then punctured and form rings as they are stretched by the airflow. 
The rings fracture, leaving filaments. Little beads of fluid form on the 
filaments, which elongate and fragment to finally produce droplets. 

Bourouiba was surprised to find so much happening to the fluid 
outside the mouth — it countered the prevailing assumption that 
droplets exit the mouth fully formed. To Gerardo Chowell, a math- 
ematical epidemiologist at Georgia State University in Atlanta, this is 
an important finding because it means that droplet formation could 
be strongly influenced by environmental conditions such as humidity 
and temperature. And that could help to explain why some diseases, 
such as flu, tend to occur more frequently at certain times of the year, 
he adds, perhaps because the ambient conditions favour the spread 
and survival of certain microbes. 

Bourouiba’s research advances previous work measuring sneeze and 
cough droplet sizes, says Ku. Fluid particles can travel varying distances 
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depending on a lot of different parameters, he says. “If just tell you the 
size of the particles, I can't tell you where they're going to go. Her work 
actually shows where they go, with a real sneeze.” 


THE NEXT LEVEL 

A back injury last year has curtailed some of Bourouiba’s more ambi- 
tious outdoor activities. But at work, she and her team are preparing to 
move into a newly built lab with a biosafety-level-2+ containment room, 
which will allow them to study the sneezes and coughs not just of healthy 
participants, but of people infected with colds and flu. In preparation 
for those studies, she has hired a microbiologist who can help the team 
to determine the microbial load in droplets and how long pathogens 
survive in the air or on surfaces while maintaining their ability to infect. 

Answering this question will be crucial, says Hughes. “We need to 
learn more about the concentration of microbes in droplets of varying 
sizes and the infectious doses of a lot of these pathogens.” The contain- 
ment room will also allow Bourouiba to control the airflow, temperature 
and humidity so that she can explore the behaviour of emitted droplets 
in environments that mimic hospitals, aeroplanes or the tropics. 

Bourouiba’s ultimate aim is to compile all of her data into a math- 
ematical model that could be used by public-health officials to identify 
the most likely routes of transmission and how to reduce the risk of 
disease spread. The model would suggest, for example, whether the 
biggest risk of contamination is from the air or from surfaces, or how 
to change the airflow or temperature to minimize the risk in a hospital. 
It could predict whether a particular person is at high risk of being a 
‘superspreader’ and should be quickly placed in a containment unit. 
During an emergency situation, when a new disease is spreading but it’s 
not clear how, it might also help officials to identify the most dangerous 
environments, such as aeroplanes, so that people can avoid them. Then, 
as the first infected patients are tested and more is learned about the 
pathogen, those data could be incorporated into the model to refine 
the risk assessment. 

Chowell, who models the spread of infectious diseases, hopes that 
Bourouiba’s work could eventually be used to give diseases an ‘airborne 
score: Knowing that a pathogen is transmitted by airborne aerosols, say, 
85% of the time could give public-health officials a better idea of how fast 
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and far an outbreak will grow, compared with one that’s just 5% airborne, 
he says. “Models require data, and I think the efforts of Bourouiba and 
others will help us better calibrate the design of these models, and this 
will have an impact on our ability to forecast disease spread in real time.” 

That may depend on the disease, however. Work from Donald 
Milton, an environmental-health scientist at the University of Mary- 
land School of Public Health in College Park, suggests that Bourouibas 
approach may not have much impact on the study of influenza because 
people with flu rarely sneeze’. Studying people with the common cold 
might be more fruitful, he says, because they sneeze more often. 

Milton also cautions that focusing on sneezes and coughs may not 
capture the whole story of respiratory-disease transmission. Breathing 
and talking are important to consider as well. He and his team have 
detected flu viral RNA in particles that were simply exhaled by patients, 
and they have even cultured viruses from such particles. Bourouiba says 
that she can study breathing emissions using her methods if they turn 
out to bea factor, but she first wants to study infected people to see which 
are the most important emissions to examine. 

One occupational hazard for Bourouiba is that it’s difficult to escape 
her work: whenever she hears a sneeze ona plane or in the classroom, 
she can't help thinking about the droplets flying through the air. There 
is not much she can do about that, but it does remind her why she 
became so fascinated with fluid mechanics when she was an under- 
graduate: fluids are everywhere. 

Her videos might earn her the nickname of ‘the sneeze lady; a stu- 
dent once warned her. But she says she doesn't mind. “If people get 
interested in the topic because of the humorous aspect, I have no prob- 
lem with that? m= 


Corie Lok is an editor for Nature in Boston, Massachusetts. 
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Researchers prepare to filter crystals at an Amgen small-molecule-manufacturing facility. 


Drug companies must 
adopt green chemistry 


John L. Tucker and Margaret M. Faul describe how they transformed their 
company to save time and money by making drugs sustainably. 


ceutical companies have moved to using 

green-chemistry practices for drug dis- 
covery, development and manufacturing. 
These firms include ours, Amgen, and others 
such as the Merck Group, Abbott, Johnson & 
Johnson and Roche. Ranking systems such 
as the Dow Jones Sustainability Indices and 
the Pacific Sustainability Index’ track how 
well firms are doing. This shift is being 
driven by the realization that processes that 


lE the past decade, many large pharma- 


are cheaper and environmentally superior 
deliver a competitive advantage. 

Success depends on instilling a culture of 
sustainability into a firm. Staff at all levels 
— from management to lab scientists — 
need to understand the concepts of green 
chemistry and how they might be embraced 
to everyone's benefit. The future rewards of 
major operational changes need to be visual- 
ized and funding must be put in place before 
any results are realized. 
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Here we set out how to build such a culture, 
based on our experience at Amgen, which is 
headquartered in Thousand Oaks, California. 
Change cannot be achieved overnight, but 
we found thata series of small wins can build 
momentum until the vision becomes clear 
and is accepted. By following a combination 
of bottom-up’ and top-down’ approaches, 
Amgen rose from having a C+ rating on the 
Pacific Sustainability Index in 2007 to become 
one of the top-rated (A+) companies in 
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> 2012 (see ‘Sustainability scores’). The 
ingredients are: an empowered team with 
management support to lead the transforma- 
tion; staff education, awareness and recogni- 
tion; investment in technology; development 
of metrics and tools; and external collabora- 
tion and outreach*”. 

Our approach focuses on the ‘triple 
bottom line’ — profit, people and planet 
— and the 12 principles of green chemis- 
try’, which include minimizing ingredi- 
ents, waste, toxicity and energy. Each of 
these principles can be applied to a sector 
or product. For instance, a detergent’s for- 
mula might be redesigned so that it degrades 
without accumulating, persisting or releas- 
ing toxic chemicals into the environment. 

Drug development has its own challenges. 
A pharmaceutical’s therapeutic response 
depends on its molecular structure. Existing 
drugs have already survived a battery of toxi- 
cological and therapeutic tweaks, as well as 
clinical studies that took years to complete. 
Redesigning a drug to be more readily degra- 
dable could become an entirely new develop- 
ment project — changing its structure might 
alter its function. But many green-chemistry 
principles can make processes more effi- 
cient, for example by reducing the number 
of reaction steps, or by using less energy or 
materials (see ‘“Green-chemistry principles 
to drive sustainability’). 

The challenge is to catalyse a new 
norm across the industry while meeting 
institutional expectations and demonstrat- 
ing value. 


SEVEN STEPS 

Empower champions. The first step taken 
at Amgen was to create a high-level green- 
chemistry team that spanned the company’s 
many functions. The aim was to define and 
entrench green-chemistry expectations 
across the company — framed as ‘how, why, 
what and where’. The team initially com- 
prised six scientists — a chair (one of us, 
J.L.T.), representatives from process and ana- 
lytical chemistry, engineering, environment, 
health and safety, and drug-production 
technologies — and was supported by an 
executive sponsor (the other of us, M.M.E). 
Medicinal-chemistry teams and biological- 
molecule representatives were added later. 
All steps of drug discovery, development 
and manufacturing at Amgen were con- 
sidered across the company’s multiple sites. 
Resources had to be secured for communi- 
cation, collaboration and development of 
green methodologies and technologies. 


Raise awareness. The green-chemistry 
team at Amgen set up a series of lectures 
by innovators and thought-leaders from 
within and outside the company to spread 
knowledge of green-chemistry principles, 
their potential for drug development and 
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examples of good practice. Websites and 
visuals were created and disseminated. 
For example, notices highlighting the 
12 principles were placed on hoods and in 
laboratories, and green-solvent and reagent- 
selection guides were supplied. Challenges 
had to be acknowledged, including practi- 
cal difficulties in using sustainable methods 
and materials, the importance of maintain- 
ing industry competitiveness during the 
transition, as well as the need for transpar- 
ency in meeting regulatory expectations. 
The greatest difficulty we encountered 
was providing a vision of the future state 
of the organization and convincing people 
that it was achievable and worth pursu- 
ing. Employees needed strong incentives 
to adopt sustainable principles in their 
already challenging careers. Productivity 
and operational efficiency were expected 
to rise regardless of change, and developing 
new drugs across the industry was becom- 
ing increasingly difficult. Converting scep- 
tics was hard. Scientists demanded data, 


SUSTAINABILITY SCORES 


precedents and straight answers as to why 
or how an initiative should or should not be 
pursued. Business leaders wanted figures on 
efficiency, cost and environmental impact. 


Expand collaborations. To share knowl- 
edge, Amgen joined the American Chemi- 
cal Society’s Green Chemistry Institute’s 
Pharmaceutical Roundtable and the IQ 
Consortium’s Green Chemistry Working 
Group. These interactions provided: indus- 
try harmonized tools, such as solvent and 
reagent guides; insight into industrial green- 
chemistry strategies and practices; access to 
metrics; opportunities to influence academic 
research; and discussions with governmen- 
tal agencies such as the US Environmental 
Protection Agency, the Food and Drug 
Administration® and the National Science 
Foundation. Getting many companies and 
regulators around the same table helped to 
smooth the adoption of green-chemistry 
principles, ensuring that cheap, safe and fast 
access to new therapies would continue. 


The Pacific Sustainability Index rates information reported on the websites of the 20 largest drug 
companies. Scores out of 100 are given in areas including environmental and social measures, and are 
based on a standard questionnaire. The top 4% receive A+ ratings; the bottom 4%, F. 
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Mc MD MF 
Merck Group 54 (out of 100) 


Amgen scored highly in 
social and environmental 
intent by reporting on 


areas that it still needs to 
improve, such as water 
and air emissions. 


Pfizer posted a lot of 
information, but more 
specifics on topics such as 
employee relations would 
have raised its scores. 


UCB published its first 
Corporate Social 
Responsibility Report in 
2011 that stated goals for 
improvement. 


2012 index based on companies in the Forbes 2010 Drugs and Biotechnology sector list. 
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Define metrics. We collated and shared 
quantitative tools for identifying internal 
strengths and weaknesses and best prac- 
tices. Large pharmaceutical firms track the 
efficiency and waste of their drug portfolios 
through metrics such as the E factor, which 
measures kilograms of waste generated 
per kilogram of product, or process mass 
intensity (PMI), a similar metric that meas- 
ures the total mass of materials per mass of 
product. The Amgen green-chemistry team 
developed metric calculators for the elec- 
tronic notebooks used by scientists in the 
lab to pinpoint the most inefficient steps and 
operations within projects and to compare 
progress as a project moves through process 
development. 

The metrics exposed a clear correla- 
tion between reducing waste and lowering 
process costs across Amgen. For example, 
using fewer materials in one synthetic pro- 
cess that was being developed for a new 
clinical candidate reduced the E factor by 
82% and the cost by 83% while also improv- 
ing the overall yield. 


Recognize achievements. An internal 
award programme raised awareness among 
employees of the impact and advantages 
that green chemistry brings. The competi- 
tion sought and broadcast the best exam- 
ples of how adopting the principles led to 
better outcomes. Leaders presented prizes 
and the winners served to inspire oth- 
ers. For example, researchers working on 
etelcalcetide (which is being assessed as 
a treatment for a complication of chronic 
kidney disease) received Amgen’s green- 
chemistry award for developing a process 
that reduced their organic-solvent use by 
more than 400,000 litres and shortened the 
projected manufacturing processing time. 


Invest in technology. We explored new 
methods. For example, we switched to using 
enzymes as reagents in the synthesis of small 
molecules, moving away from convention 
reactions catalysed by transition metals. This 
reduced the number of steps and increased 
reaction throughput. For example, enzyme 
catalysis allowed us to make a key fragment 
of a drug candidate in early development, 
reducing the time to manufacture by 80%. It 
also eliminated* volumes of organic solvent 
used during chromatographic purification 
of small molecules, doubled the yield and 
reduced the cost of the starting material by 
more than 99%. 

A standard procedure for oxidizing 
double bonds to aldehydes is ozonoly- 
sis (aldehydes are key chemical groups 
that enable the assembly of molecules). 
In one case study, our initial synthesis of 
an aldehyde intermediate used a flam- 
mable solvent in the presence of oxygen. 
Using a ‘continuous flow’ process — in 


GREEN-CHEMISTRY PRINCIPLES TO DRIVE SUSTAINABILITY 


CONCEPT 
Atom economy 
Minimize solvent use 


Optimize reagents 


Convergent synthesis 
(produce several pieces of a 
molecule at once) 


Reduce energy use 


Analyse reactions in real time 


Prioritize safety 


GOOD FOR PLANET 
Fewer by-products. 
Less waste, less energy. 


Recyclable reagents and 
catalysts minimize volume of 
chemicals needed. 


Increases efficiency, saves 
energy. 


Less pollution from power 
generation and transport. 


Reduces exposure or release to 
environment. 


Non-hazardous materials 
reduce risk of exposure, release, 
explosions and fires. 


GOOD FOR PROFIT 

More value from less material. 
Higher throughput. 

Higher efficiency. 


Higher efficiency, fewer 
operations. 


Shorter, more efficient processes 
under milder conditions. 


Increases throughput and 
process efficiency, fewer 
reworks. 


Reduces potential harm to 
workers, down time and need for 
special control measures. 


which the conditions of a stirred reactor 
are optimized in real time to maximize the 
yield — rather than batch mode allowed us 
to process large quantities of material in a 
short time (5 kilograms of aldehyde inter- 
mediate in 18 hours) without building up 
dangerous amounts of reagents, intermedi- 
ates, solvents, ozone and oxygen gas. 
Even simple changes, such as using 
large, disposable plastic rather than stain- 
less steel vessels for manufacturing bio- 


logic drugs (made 
“Corporate using recombinant 
reporting must DNA technology), 
be transparent saved time, space, 
and present an effort and money. 
unvarnishedand Although it creates 
accurate view.” More solid waste, 


single-use vessels 
do not need rooms and resources such as 
water or steam to clean or sterilize them. 
Costs fall and production capacity can grow 
without increasing — and even by decreas- 
ing — the waste footprint of the plant. 


Promote outreach. To spread the sustain- 
ability mindset across the industry, it will be 
crucial to work with academics to prepare 
the next generation of green chemists and 
with regulators to assess and reward efforts. 
To this end, Amgen scientists regularly give 
talks in universities on green chemistry. The 
public and investors value sustainable prac- 
tices (albeit in ways that are hard to quantify) 
so it is important that they know of the com- 
pany’s commitment. To this end, corporate 
reporting must be transparent and present 
an unvarnished and accurate view. 

All this hard work has paid off. In 2013, 
Amgen was declared a top performer in the 
pharmaceutical industrial segment by the 
Dow Jones Sustainability Index, and placed 
21st in Newsweek's 2015 ranking of green US 
companies from all sectors. 

The financial benefits are already clear. 
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Sustainable Asset Management, a company 
that directs investment dollars, has named 
Amgen a ‘sustainability mover, opening up 
new sources of investment. 

Amgen’ initiative is not window dress- 
ing. It is rooted deep in a broad commitment 
to deliver medicines to people in a way that 
uses fewer resources and promotes industry 
competitiveness. We will continue to focus 
on the principles of green chemistry, seek 
operational efficiency, explore new technolo- 
gies and support education and research. And 
we will work with other companies in non- 
competitive areas to encourage the spread 
of green chemistry throughout the industry. 
There is still much to do to convince others to 
create a culture of sustainability. 

Green chemistry can deliver for people, 
planet and profit. Those who embrace it will 
reap the benefits in future. Those who fail to 
evolve may cease to be relevant. m 


John L. Tucker is a senior scientist in 
process development, and Margaret 
M. Faul is executive director of process 
development, at Amgen Inc., Thousand 
Oaks, California, USA. 

e-mail: tuckerj@amgen.com; 
mfaul@amgen.com 
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The seas cannot be saved 


a ———S 


I) 


a budget of breadcrumbs 


The marine arm of UNESCO’s World Heritage Convention needs secure funding to 
realize its vast potential to protect the ocean, argues Fanny Douvere. 


global exemplar in protecting Earth’s 

most iconic places is the 1972 World 

Heritage Convention of the United 
Nations Educational, Scientific and Cultural 
Organization (UNESCO)'. The convention's 
founding years are credited with rescuing 
the ancient Egyptian temples of Abu Simbel 
from being lost under the Nile — an opera- 
tion that required collaboration across more 
than 50 countries. 

World-heritage recognition has since 
become a hallmark for sustainable protec- 
tion of valuable sites, from Peru’s Machu 
Picchu to Tanzania's Serengeti National Park. 
UNESCO's World Heritage List reflects the 
common heritage of humankind, a legacy to 
pass on to future generations. But its impact 
is felt mostly on land. 

UNESCO also has a World Heritage 
Marine Programme, which I lead. It was 
created by UNESCO’s World Heritage 
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Committee just over ten years ago to help 
secure effective conservation for marine sites 
on this list. Although the programme has a 
powerful brand that enables effective negotia- 
tion with government bodies and civil soci- 
ety, it is unfunded. Like a non-governmental 
organization (NGO), it must secure financial 
support from various sources. Finding these 
sources is difficult. Governments struggle to 
cover the costs of specialized programmes 
across the UN. Philanthropic organiza- 
tions often seem more comfortable funding 
research institutions or NGOs. The World 
Heritage Marine Programme currently has 
just 3 professionals to cover 47 sites across 
36 nations (see ‘Ocean treasures ). 
World-heritage marine work can help 
governments to design feasible approaches 
to the threats that face some of the last wild 
places on Earth. Despite UNESCO’s estab- 
lished ability to influence governments, and 
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to formulate and implement effective change 
for sustainable management, philanthropic 
contributions are hard to come by. With the 
oceans facing existential threats from pol- 
lution, climate change and overfishing, it is 
time to invest in one of the best tools avail- 
able for conservation. 


TRACK RECORD 
Our programme can deliver far-reaching 
impacts where others cannot. 

In 2011, the Australian government con- 
sidered the protection of the Great Barrier 
Reef adequate even as scientists increasingly 
warned that the reef was in poor condition 
and getting worse”’. Despite the iconic status 
of the reef, and it having been a prime exam- 
ple of marine-protected-area management for 
40 years, the site had suffered from decades of 
incremental decisions that threatened ‘death 
by a thousand cuts. More than two-thirds of 


XL CATLIN SEAVIEW SURVEY 


Australia’s Great Barrier Reef is a UNESCO 
World Heritage Marine Site. 


all coastal-development proposals near the 
reef submitted between 1999 and 2011 had 
been approved. Previous government finan- 
cial commitments to halt and reverse the 
declines in water quality — declines largely 
responsible for the loss of coral systems clos- 
est to the coast — came up for revision in 2013 
but renewal was uncertain. In 2012, the World 
Heritage Committee issued its first warning 
that it would list the site as ‘world heritage in 
danger’ unless it saw proof of substantial pro- 
gress by the following year. 

This opened the way for the UNESCO 
World Heritage Centre and its scientific 
advisory body, the International Union 
for Conservation of Nature, to embark on 
extensive negotiations with the Australian 
government, eventually changing the gov- 
ernment’s approach entirely. It reversed its 
original plan to dump 3 million cubic metres 
of dredged material into the Great Barrier 
Reef. Indeed, in 2015, it banned the dumping 
of dredged material throughout the world- 
heritage site — an area larger than Italy. 
Australia’s government committed more 
than Aus$200 million (US$145 million) to 
improve water quality and set an ambitious 
aim to reduce pollution runoff by 80% by 
2025. Proposed port-development areas 
have been restricted from 11 to 4 major ones, 
and future coastal development must align 
with a strategic plan aimed at improving the 


health of the reef between now and 2050. 

In his address to the World Heritage Com- 
mittee last July, Australia’s environment min- 
ister, Greg Hunt, said that UNESCO advice 
had allowed Australia “to do in 18 months 
what otherwise would have taken decades”. 

Something similar happened in the Belize 
Barrier Reef, the world’s second largest coral- 
reef system. This site was placed on the list 
of world heritage in danger in 2009 because 
of the destruction of mangrove forests for 
coastal development and ongoing threats of 
offshore oil exploration. We began intensive 
talks with the government and stakehold- 
ers in early 2015, which led to a road map to 
reverse the endangered status. Last Decem- 
ber, following years of deadlock, the Belizean 
government announced a permanent ban on 
all oil exploration in the site; in February, it 
approved an ambitious coastal-management 
plan. These are concrete steps that can lead to 
a brighter future for this unique array of reef 
types, and the nearly 200,000 Belizeans who 
depend on it for their livelihoods. 

These are just tasters of the sustainable 
change that a properly funded World Herit- 
age Marine Programme could bring. Since 
the first true marine site was inscribed on the 
UNESCO World Heritage List in 1982, our 
scope has grown into a global collection of 
sites that stretch from the tropics to the Arc- 
tic. The list includes the breeding grounds 
of the world’s last healthy population of grey 
whales (Eschrichtius robustus), in Mexico; 
the highest density of ancestral polar-bear 
dens, in Russia; and the home of one of the 
world’s most ancient fish, the coelacanth, 
in South Africa, and that of the inimitable 
marine iguanas (Amblyrhynchus cristatus), 
in the Galapagos Islands. 


EFFECTIVE MANAGEMENT 

Most of these places host a range of activities 
aimed at conservation and at generating 
income. Tension between opposing con- 
cerns is inevitable, and the most durable 
solutions emerge when diverse viewpoints of 
activists, scientists and government officials 
are effectively mediated. Our programme is 
uniquely positioned to take on this role. 

In its first few years, the World Heritage 
Marine Programme had limited capacity 
and mainly worked by supplying recom- 
mendations and basic guidance to a handful 
of sites. But better science, the uncertainties 
of climate change and increasing pressure 
to use ocean space have brought shifting 
information and demands. Effective man- 
agement of the flagship marine protected 
areas required us to adopt a more flexible, 
hands-on approach. Now we coordinate 
technical support missions, bring site 
managers and external experts together 
to exchange ideas, and increasingly bro- 
ker solutions with government leaders to 
secure the urgently needed protection of 
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irreplaceable marine ecosystems. 

Our work in Belize and Australia shows 
that if we can dig into the nitty-gritty of prob- 
lems collaboratively we can affect change. But 
without a regular income, the programme 
cannot support even the sites that most need 
focused attention. For every case like the 
Great Barrier Reef, there are other urgent ones 
that do not get support. 

For example, for nearly a decade, the World 
Heritage Committee has recommended that 
the Panamanian government establish a 
comprehensive plan for sustainable fisher- 
ies in Coiba National Park. Overfishing in 

this park, considered 


“Tf we can the jewel of Panama, 
dig into the has almost wiped 
nitty-gritty out once-abundant 
of problems hammerhead sharks 
collaboratively and boosted the jel- 
we can affect lyfish population. We 
change.” must take advantage 


of the possibility of 
working with the Panamanian government 
to deliver sustainable fisheries and secure a 
healthy future for Coiba. 

The same can be said for the Sundarbans 
in Bangladesh, part of the largest mangrove 
forest in the world, which is now threatened 
by coastal development. Another example 
is Banc d’Arguin National Park in Maurita- 
nia, where catch from fishing just outside 
the site’s borders has increased more than 
12-fold from 1994 to 2010. But with current 
funds, we simply cannot be everywhere. 


OCEAN INVESTMENT 

To address the challenges at Coiba and other 
sites that struggle with illegal or unsustain- 
able pressures, the World Heritage Marine 
Programme needs to be recognized as an 
effective body worthy of investment. It must 
be able to plan for the long term and focus 
attention on the most urgent priorities. 

The successes mentioned here were made 
possible largely as a result of the steady and 
mostly unconditional support that Swiss 
watchmaker Jaeger-LeCoultre has provided 
to the programme since 2008. This has 
allowed the programme to steer away from 
a short-term project-by-project approach, 
and instead concentrate on the type of input 
needed to achieve the ultimate measure of 
success: improved conservation of sites’ 
treasured values that won their world- 
heritage recognition in the first place. 

To expand efforts and coordinate our 
work so that it is efficient and impactful, we 
need a broad base of stable financial support. 

We now stand at a moment of even greater 
opportunity to preserve the open ocean — 
waters not subject to any single country’s 
jurisdiction. These expanses, known as the 
high seas, cover half our planet. They also 
need protection that few — if any — mecha- 
nisms provide. From 2010 to 2012, seven > 
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marine protected areas were established, 
¥ 3 é : covering more than 285,000 square kilo- 
O é : hada : metres in the Atlantic Ocean under the 

cedn . auspices of OSPAR, a cooperative effort by 
: ~*~ i= L % 15 governments and the European Union 
tre as ure S .e to protect the northeast Atlantic. These are 
UNESCO recognizes 47 exceptional wy some of the first of very few protected areas 
world-heritage marine sites across in the high seas. But this is only a regional 
36 nations — from mangrove forests : : action. The UN, under the 1982 Convention 
ae e aie Gp iscune . on the Law of the Sea, has started negotia- 
tions for a possible new agreement to protect 
high-seas biodiversity on a global scale. This 
1 Marine iguana framework agreement is real progress, but as 
(Amblyrhynchus oimiee: yet lacks practical procedures to nominate, 
cristatus) in the ; oversee and protect sites. 
Galapagos Islands. Enforcing protection of the high seas is 
: one of today’s biggest challenges in ocean 
conservation. The world-heritage system 
is equipped to help get this protection in 
place. It has a 40-year history of identifying 
and overseeing the state of conservation of 
places of Outstanding Universal Value across 
163 nations and has had ample successes. 
Such institutional experience is unparalleled 
in nature conservation, but its capacity to 
navigate such complexities and to preserve 
ecosystems is often overlooked. 

Ina February interview with The New York 
Times, biologist E. O. Wilson called for creat- 
ing something equivalent to the UN world- 
heritage sites to protect the open ocean as 
priceless asset of humanity. 

Our ability to engage constructively with 
government is starting to produce real, last- 
ing results in conservation. It could be rep- 
licated in other marine sites to great effect. 
Strengthening the international oversight of 
flagship protected areas will amplify the work 
of scientists, local NGOs and related organiza- 
tions. Their concerns become international 
causes that, through a tactical and skilled use 
of the World Heritage Convention, can lead to 
government action that benefits us all. 

Being effective requires robust invest- 
? ment. The World Heritage Convention 
B Green turtle 5 %; topes} i pe cannot change the world on a budget of 
(Chelonia mydas) Py breadcrumbs. Philanthropists seeking 
in Aldabra Atoll. "Bes ip 7 3 f Wo) a) investments that make lasting changes 
should look beyond their conventional 
outlets of NGOs and research institutions 
and consider this potential. Ignoring world 
heritage is a lost chance for our oceans. m 


Barrier Reef. 


EA. 


Fanny Douvere is coordinator of the World 
Heritage Marine Programme of the United 
Nations Educational, Scientific and Cultural 
Organization (UNESCO), Paris, France. 
e-mail: f douvere@unesco.org 
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BOOKS & ARTS 


ASTRONOMY 


The primary mirror of the James Webb Space Telescope, which will launch in 2018. 


Cosmic detectives 


Bernie Fanaroff surveys a study that probes telescopes 
in history and across the electromagnetic spectrum. 


still call themselves optical, radio or 

X-ray astronomers. The major prob- 
lems of astrophysics and cosmology, such 
as how stars form and the nature of active 
galactic nuclei, cannot be solved by observ- 
ing in only one part of the electromagnetic 
spectrum. Thus we live in the era of multi- 
wavelength and multimessenger astronomy, 
which demand different kinds of telescope 
and technology to observe different parts of 
the spectrum and even other particles and 
waves, such as neutrinos, cosmic rays and 
gravitational waves. 

In Eyes on the Sky, British astronomer 
Francis Graham-Smith delivers a valuable 
survey of the history, technology and design 
of telescopes across the electromagnetic 
spectrum, starting with Galileo Galilei’s 
pioneering seventeenth-century refracting 


L: is a little odd that many astronomers 
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telescope. Graham-Smith explains the 
principles of how telescopes, such as optical 
reflectors or X-ray telescopes in space, make 
images or spectra and how they detect waves 
and photons, using everything from radio 
receivers to solid-state mega-pixel charge- 
coupled devices. As he notes, the field has 
been transformed, especially in recent years, 
through a combination of technical advances 
and radical change in astronomy’s organiza- 
tion and scale, with the advent of large inter- 
national teams and multinational projects. 
Astronomers are always pushing the 
boundaries of technology, out of the need 
to detect more and 
more of the spectrum 
from increasingly 
faint objects. Graham- 
Smith’s account of that 
process is fascinating. 


For more on science 
in culture see: 
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Some of the ground- 
breaking techno- 
logical advances have 
been in detectors, 
very fast electronics 
and computing, and 
space telescopes. As 
he explains, satellite 
observation is essen- 


tial for the parts ofthe Eyes on the Sky: 
spectrum blocked by A Spectrum of 
Earth’s atmosphere Foleseape? 

P > FRANCIS GRAHAM- 
such as X-rays. New- git 
generation telescopes Oxford University 
of this kind currently Press: 2016. 
include NASAss Kepler 


space observatory and the European Space 
Agency’s Planck satellite telescope, while 
next-generation satellite instruments will 
include NASA’s James Webb Space Tele- 
scope (JWST). And the JWST, along with 
ground-based instruments such as the radio 
telescopes of the Square Kilometre Array 
(SKA) in South Africa and Australia, will 
produce huge quantities of data from sky 
surveys of unimagined sensitivity and scope. 
With young researchers able to access a flood 
of wonderfully exciting data, this will be a 
new golden age. Meanwhile, the detection 
of gravitational waves with the Advanced 
Laser Interferometer Gravitational- Wave 
Observatory, announced this year (S. Rowan 
Nature 532, 28-29; 2016), heralds the begin- 
ning of gravitational-wave astronomy. 

Graham-Smith gives a useful summary of 
what is to come from these telescopes and 
surveys. Huge data sets of galaxies and other 
objects are being produced by sky surveys 
at different wavelengths, and many astrono- 
mers spend a large part of their time cross- 
matching the objects found. For instance, 
objects found in radio surveys must be 
matched with those found in surveys at opti- 
cal wavelengths, to learn about the source of 
the radiation (such asa galaxy) and its dis- 
tance (measured using the Doppler shift, or 
‘red shift, of the spectrum, which is caused 
by the expansion of the Universe). 

There are many challenging problems ripe 
for cracking. One is the structure of the Uni- 
verse. How did a once-uniform ball of very 
hot gas and energy become a highly struc- 
tured, complex Universe, evolving over the 
13.8 billion years since the Big Bang? Optical 
surveys such as the Sloan Digital Sky Sur- 
vey tell us about the distribution of galaxies, 
galactic clusters and super-clusters. Min- 
ute fluctuations in the cosmic microwave 
background radiation — measured by the 
Planck telescope, among others — tell us 
about conditions when the Universe was 
only 380,000 years old, before the first stars, 
galaxies and clusters formed. The SKA will 
probe this ‘cosmological dawr and track the 
development of structure by looking at the 
history of hydrogen gas in the Universe. 


KEYSTONEUSA-ZUMA/REX/SHUTTERSTOCK 


As Graham-Smith discusses, the structure 
and evolution of galaxies is another hot topic. 
Almost all galaxies have supermassive black 
holes spinning at their centres, with masses 
millions to billions of times that of the Sun. 
Vast amounts of energy are radiated from 
near the black hole, or carried off as kinetic 
energy by collimated (very narrow) jets 
squeezed out along the poles of rotation and 
extending, in some cases, for a megaparsec. 
This process is probably key to the formation 
of stars and the evolution of galaxies. Energy 
from the jets and radiation is dumped into 
the gas between the stars and galaxies, and is 
believed to significantly influence the rate of 
star formation and, asa result, galaxy evolu- 
tion. The heating and stirring of the gas in 
turn affects the rate of accretion and energy 
generation around the black hole, in a pow- 
erful feedback mechanism. 

To refine their picture of this activity, 
astronomers are marshalling findings froma 
range of telescopes to map the jets’ radio emis- 
sion and estimate their kinetic and magnetic 
energy, as well as the energy emitted at optical 
and ultraviolet wavelengths. They are using 
X-ray observations to determine how hot the 
gas is, and infrared observations to gauge how 
much dust there is in the interstellar medium. 
They observe spectral lines at millimetre 
wavelengths to map the outflow of molecular 
gas. X-rays and y-rays also tell us about the 
gas dynamics close to the black hole or in the 
region where the jets are launched. 

Eyes on the Sky does contain a few surpris- 
ing errors. For example, the Karl G. Jansky 
Very Large Array radio telescope in New 
Mexico, for instance, still has 27 dishes after 
its upgrade, not 36. Nevertheless, Graham- 
Smith’s book is a very interesting explana- 
tion of the multitude of telescopes and their 
history. 

Telescope technology continues to 
develop at breakneck speed. The SKA, for 
instance, demands new technologies to 
increase sensitivity, process huge quantities 
of data very fast and keep costs in check. 
This and other planned great observato- 
ries — the JWST, as well as the y-ray seek- 
ing Cherenkov Telescope Array and the 
optical/near-infrared European Extremely 
Large Telescope on the ground — are likely 
to produce major discoveries in areas such 
as transient sources of radiation, the under- 
standing of planet formation, the nature 
of dark matter and the history of the Uni- 
verse. They will undoubtedly also uncover 
unknown unknowns, those serendipitous 
discoveries that are the hallmark of the great 
telescopes of history. m 


Bernie Fanaroff was the director of the 
Square Kilometre Array South Africa project 
until the end of 2015 and is now a part-time 
strategic adviser to the project. 

e-mail: bfanaroff@ska.ac.za 


Books in brief 


Earth-Shattering Events: Earthquakes, Nations and Civilization 
Andrew Robinson THAMES AND HUDSON (2016) 

A “fatal attraction”: geophysicist James Jackson’s description of 
humanity’s penchant for living in earthquake zones is all too apt, notes 
science writer Andrew Robinson in this compelling history of seismicity 
and society. Robinson traces more than 2 millennia of cataclysms, 
vividly evoking events such as the magnitude-8.8 quake-cum-tsunami 
that largely flattened Lisbon in 1755. Woven through is a history 

of seismology from its first glimmerings in ancient China, through 
geologist John Milne’s groundbreaking work in the nineteenth century 
to today’s hurdle-ridden drive to predict seismic risk. 


Engineering Eden 

Jordan Fisher Smith CROWN (2016) 

In 1972, a grizzly bear eviscerated tourist Harry Walker in 
Yellowstone National Park, Wyoming. His family’s lawsuit against 
the US National Park Service ignited a vastly broader debate about 
‘managed nature’. In this beautifully synthesized study, writer (and 
former ranger) Jordan Fisher Smith argues for symbiotic balance in 
our interaction with the wild, because “the ties that bind, bind in all 
directions”. As he shows, expert witnesses such as zoologist Starker 
Leopold helped to shift Yellowstone’s mismanagement of bears — 
notably the deliberate feeding that predisposed them to attack. 


Blue Skies over Beijing: Economic Growth and the Environment in 
China 

Matthew E. Kahn and Siqi Zheng PRINCETON UNIVERSITY PRESS (2016) 
Beijing’s atmospheric pollution in 2013 reached 40 times the safe 
level set by the World Health Organization. To gauge progress on 
the country’s urban sustainability, economists Matthew Kahn and 
Sigi Zheng apply microeconomics to industry, pollution dynamics, 
and local and central government efficacy. They see that analysis 
— along with factors such as growing environmental awareness in 
China, and evidence of sharply improved air quality in some post- 
industrial US cities — as potentially heralding a turnaround. 


Grit: The Power of Passion and Perseverance 

Angela Duckworth SCRIBNER (2016) 

When psychologist Angela Duckworth received a MacArthur 
Fellowship, or ‘genius grant’, in 2013, the irony was not lost on her; 
for years, her father had said she was “no genius”. But Duckworth 
saw sheer dogged effort as brilliance of a different sort, and 
ultimately more important to achievement than talent. She lucidly 
anatomizes the nature of grit, drawing on her own and others’ 
research (such as psychiatrist George Vaillant’s ‘treadmill test’), and 
explicating the passion, purpose, practice and optimism that feed 
perseverance and resilience. A deft corrective to IQ culture. 


A Sea of Glass: Searching for the Blaschkas’ Fragile Legacy in an 
Ocean at Risk 

Drew Harvell UNIVERSITY OF CALIFORNIA PRESS (2016) 

In nineteenth-century Bohemia (now the Czech Republic), master 
glassblowers Leopold and Rudolph Blaschka spun supremely lifelike 
replicas of organisms as teaching tools. Ecologist Drew Harvell, finding 
more than 500 models of marine invertebrates at Cornell University 
in Ithaca, New York, set out to restore them. Stunning photos of a 
number of them contextualize the dramatic taxonomic and ecological 
shifts in ocean life over the past 150 years. Barbara Kiser 
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Peace, love and lab work 


Ann Finkbeiner delves into a collection reappraising the hippy tech-heads, 
agronomic groovers and far-out ecodesigners of the ‘long 1970s’. 


and 1970s, the younger generation 

in the United States looked at its 
elders — with their unwinnable wars, florid 
military-industrial complex, intransigent 
racism and contaminated brownfields — 
and was outraged. In particular, young peo- 
ple rejected what they saw as the foundations 
of many establishment ills: the weapons and 
toxic chemicals spawned by science and 
technology. That is the standard history, 
say historians of science David Kaiser and 
Patrick McCray — and it’s not quite right. 

In their edited volume Groovy Sci- 
ence, Kaiser (author of How the Hippies 
Saved Physics (W. W. Norton, 2011); see 
H. Gusterson Nature 476, 278-279; 2011) 
and McCray show that in the “long 1970s’, 
the young, in creating a counterculture, 
didn’t so much reject science as recreate it. 
Each essay is a case history on how the hip- 
pies repurposed science and made it cool. 

What they rejected was the work of defence 
contractors, big government or corporate 
labs, which they deemed hierarchical, inflexi- 
ble and bound to special interests. By contrast, 
‘groovy’ science was (as hinted at by the word’s 
origins in 1930s jazz) playful and improvisa- 
tional, small-scale and done in the name of 
peace by “world-thinkers, dropouts from 
specialization”. Their research ranged from 
the practical (light, strong surfboards) to the 
visionary (space travel). Because some were 
drug-addled, it also encompassed the hare- 
brained (communication with dolphins, the 
fervent wish of physician John Lilly). Some of 
it is now thoroughly mainstream. 

A number of these hippies were conven- 
tionally trained scientists with doctorates. 
Psychologist Abraham Maslow looked 
beyond the behaviourism — the idea that 
humans are blank slates who react to stimuli 
— advocated by psychologist B. E Skinner 
and others. Maslow’s alternative was a 
humanistic ‘hierarchy of needs; the fulfil- 
ment of which would lead to happiness, self- 
actualization, and ultimately a better society. 
He became a patron of the Esalen Institute in 
Big Sur, California, a centre for workshops, 
encounter groups — forums that encouraged 
face-to-face communication and confronta- 
tion — and yoga classes, all aimed at training 
people to realize their potentials. His inno- 
vative approach, focusing on good mental 
health rather than pathological symptoms, 


FE: a lively decade or so in the 1960s 
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Writer Stewart Brand in 1966. 


persists in ‘positive psychology. 

John Todd, a biologist at San Diego State 
University in California, worried that 
industrial agriculture was creating danger- 
ously limited monoculture. He organized a 
network of like-minded professionals, the 
New Alchemists, who encouraged agricul- 
turally self-sufficient communities and built 
enclosed ecosystems, or ‘bioshelters, such as 
the Ark on Canada’s Prince Edward Island. 
Similar principles and technologies are 
now used by ecodesigners in creating green 
buildings that incorporate renewable mat- 
erials and have sustainable energy demands. 

Countercultural researchers jump-started 
interest in midwifery 
and home births; they 
also learned to pro- 
duce their own cheese, 
making goat’s cheese 
“no longer weird” in 
the United States. The 
grooviest of them all 
were arguably the 
bricoleurs, engineers 
in garages, who used 
“whatever comes to 
hand”. As ground- 
breaking publisher 
Stewart Brand wrote 
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Groovy Science: 
Knowledge, 
Innovation, 

and American 
Counterculture 
EDITED BY DAVID 


KAISER AND 
in the Whole Earth w earrick wccray 
Catalog (part ency- — University of Chicago 
clopaedia and part Press: 2016. 
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how-to manual, often cited as anticipating 
the Internet), these were doers “with a func- 
tional grimy grasp on the world”. Welder and 
designer Steve Baer and his friends experi- 
mented with passive solar collectors, and 
geodesic domes partly crafted from junked 
cars, in Colorado commune Drop City. He 
adapted the ideas for his New Mexico-based 
company Zomeworks. James Baldwin, a 
jack-of-all-trades, built a truck with fold-out 
sides that served as a travelling workshop and 
classroom for ecodesign technologies such 
as solar panels. Baldwin's toolkit was used 
to construct sustainable buildings, includ- 
ing the Farallones Institute in Occidental, 
California, established by architect Sim Van 
der Ryn to teach ecological design. As with 
the New Alchemists, this stream flowed into 
contemporary ecodesign. 

Most noticeable about these science freaks 
was their cheery willingness to share what- 
ever they knew or learned through manuals, 
periodicals and books, many of them best- 
sellers. Publications such as the magazine 
The Great Speckled Bird spread the ecodesign 
gospel. Science and science-fiction magazine 
Omni, influenced by psychologist and avid 
user of hallucinogenic drugs Timothy Leary, 
popularized his ideas of space migration and 
transhumanism — transcending human limi- 
tations. The Whole Earth Catalog famously 
offered tools for developing “person power” 
— everything from hammers to guides for 
building a pipe organ and books on popula- 
tion control. “We are as gods,’ Brand wrote, 
“and might as well get good at it? 

For the academic historian, Groovy Science 
establishes the “deep mark on American cul- 
ture” made by the countercultural innova- 
tors. For the non-historian, the book reads as 
if it were infected by the hippies’ democratic 
intent: no jargon, few convoluted sentences, 
clear arguments and a sense of delight. 
Because of “acid-spooked scientists, stoned 
tinkerers, and many of the other straight-up 
hippies and freaks,’ write historians Beth 
Bailey and David Farber in the afterword, 
“a substantial subset of Americans came to 
rethink how they eat, how they communi- 
cate, how they stay healthy, how they design 
and build, and how they have fun.” m 


Ann Finkbeiner is a science writer in 
Baltimore, Maryland. 
e-mail: anniekf@gmail.com 
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Broad Institute keeps 
CRISPR tools open 


As chief communications officer 
at the Broad Institute of MIT and 
Harvard, I wish to clarify that the 
institute makes patent rights for 
CRISPR-Cas9 genome-editing 
technologies available globally 
across academia and industry 
(see J. Sherkow Nature 532, 
172-173; 2016). 

For academic research, the 
patent rights are freely available 
and we openly share CRISPR 
reagents through the non-profit 
repository Addgene. So far, 
Addgene has processed more 
than 30,000 requests for these 
reagents. 

For commercial research, we 
designed a non-exclusive licensing 
model. For commercial products, 
we also follow a non-exclusive 
model — except for human 
therapeutics, for which we use an 
‘inclusive innovation model. This 
is because companies often need 
exclusivity to justify investing in 
expensive clinical trials. 

The CRISPR-Cas9 licensing 
agreement with our primary 
licensee, Editas, stipulates 
that, for target genes not being 
pursued by Editas, we (Broad, 
Harvard and MIT) will make the 
licences available to other parties 
to develop new medicines. This 
helps to ensure that no promising 
target genes will be neglected. 
Lee McGuire Broad Institute of 
MIT and Harvard, Cambridge, 
Massachusetts, USA. 
Imcguire@broadinstitute.org 


Revive China’s green 
GDP programme 


Ina potentially big step towards 
achieving its target of sustainable 
growth by 2020, China's 
government is developing a 
green measure of gross domestic 
product (GDP). We suggest that 
the country’s upcoming audit of 
its natural-resource assets would 
provide an ideal opportunity to 
launch this ‘green GDP; which 
factors the environmental costs 
of economic growth into the 


conventional GDP. 

The government is recognizing 
that economic growth comes 
at too high a price. The cost of 
Chinas pollution damage roughly 
quadrupled from 2004-13, and 
has accounted for up to 3% of 
annual GDP over the past decade. 
Each year, there are 350,000 to 
500,000 premature deaths from 
particulates in cities (Z. Chen 
et al. Lancet 382, 1959-1960; 
2013). Indeed, the health cost of 
air pollution amounts to one- 
third of total environmental costs. 

Although China’ original 
green GDP programme of 2006 
was shelved within a year, studies 
ona green GDP index have never 
stopped. In the push for green 
development, faster economic 
growth is no longer the priority. 
And under China’ latest Five- 
Year Plan (see Nature 531, 
425-426; 2016), local 
governments are now accountable 
for environmental quality and 
ecological conservation. 
Jinnan Wang* Chinese Academy 
for Environmental Planning, 
Beijing, China. 
wangjn@caep.org.cn 
*On behalf of 7 correspondents (see 
go.nature.com/rsebg9 for full list). 


Bee-hawking hornet 
already in line of fire 


We agree with Frederico 
Santarém and colleagues that 
public campaigns will help 
to control the invasive Asian 
hornet Vespa velutina (see Nature 
532, 177; 2016). However, this 
bee-hawking hornet has been 
on Europe's risk-assessment list 
for invasive alien species and 
targeted for action since June 
2015 (see go.nature.com/gigftz). 
It has also been intensively 
researched since 2008 under the 
European Agricultural Guarantee 
Fund's apiculture programme. 
The only way found so far to 
contain the V. velutina invasion 
is to destroy colonies as soon as 
nests are spotted (see J. R. Beggs 
et al. BioControl 56, 505-526; 
2011). Public awareness and 
collaboration are crucial to help 


detect these nests in tree crowns. 
The hornet’s real threat is to 
pollinators, not to humans (see, 
for example, C. Villemant et al. 
Biol. Conserv. 144, 2142-2150; 
2011). EU legislation aims to 
coordinate a plan for invasion 
control, which also depends 
ona greater willingness among 
European researchers to work 
together. 
Quentin Rome, Claire Villemant 
Institut de Systématique, Evolution, 
Biodiversité (ISYEB), UMR 7205 
— CNRS, MNHN, UPMC, EPHE, 
Sorbonne Universités, Paris, France. 
rome@mnhn.fr 


Industry parks limit 
circular economy 


We suggest that China's proposed 
circular economy should cover 
the entire life cycle of products 
and not just focus on industrial 
parks (see J. A. Mathews and 
H. Tan Nature 531, 440-442; 
2016). 

Consumer-waste recycling, 
for example, should also be 
part of the circular economy. 
The delivery of online orders 
in China last year accounted 
for some 8 billion plastic bags, 
10 billion boxes and 17 billion 
metres of adhesive tape, yet most 
of the retailers and companies 
responsible have no recycling 
arrangements (see go.nature. 
com/pv2omgq; in Chinese). 

Industrial parks designed 
for a circular economy can 
have serious limitations, 
because the interdependence 
of manufacturers creates a 
vulnerability. For example, if 
any one of them closes down 
or switches to other products, 
the whole production chain can 
collapse. 

Moreover, these parks 
cannot be built everywhere. 
Site selection depends on local 
manufacturing priorities and on 
that area’s environmental, social 
and technological conditions. 
Transforming conventional 
industrial parks and zones to 
circular-economy parks has also 
been problematic because of poor 


planning, design and supervision. 
Government-controlled 
circular-economy projects need 
to be made publicly accountable 
to safeguard against financial 
corruption and to ensure 
transparent oversight. Disorderly 
operation, enforcement or 
supervision in the recycling 
of pollutants or hazardous 
materials, for example, can lead 
to disasters such as last year’s 
huge chemical explosion at 
Tianjin (see Z. Tang et al. Nature 
525, 455; 2015). 
Xin Miao Harbin Institute of 
Technology, Harbin, China. 
Yanhong Tang Northeast 
Agricultural University, Harbin, 
China. 
xin.miao@aliyun.com 


Ukraine should cut 
back nuclear power 


Thirty years on from the 
Chernobyl nuclear disaster, 
the Ukrainian government has 
increased the contribution of 
nuclear power to the nation’s 
total energy balance. In 1991, it 
was 8%; by 2014, it had risen to 
22% (go.nature.com/s4qgjk). 

In my view, Ukraine should 
be following Lithuania’s lead. 
Lithuania has ceased to depend 
on nuclear power, substituting 
renewable energy sources (mostly 
biofuels) for its former 26% 
nuclear-power contribution in 
1991. Renewables are likewise 
thriving in Latvia (39%) and 
Estonia (27%) (go.nature.com/ 
z3ibww); Georgia (31%; go.nature. 
com/Sihqsg); and Kyrgyzstan 
(28%; go.nature.com/slcnsw). 

Ukraine lags way behind 
because of the funding deficit 
for new green technologies, with 
renewables accounting for just 
2.6%. This is despite the country’s 
favourable conditions for green 
energy, including wind, solar 
and hydropower. This untapped 
potential could be swiftly realized 
with appropriate financial and 
legislative support. 
Alexander Gorobets Sevastopol, 
Crimea. 
alex-gorobets@mail.ru 
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OBITUARY 


Walter Kohn 


(1923-2016) 


Condensed-matter physicist who revolutionized quantum chemistry. 


alter Kohn’s profound questioning 
of what the arrangement of elec- 
trons can tell us about a material's 


character led to density functional theory. 
The theory, which predicts electron energies, 
became a basic tool in efforts to compute the 
properties of materials and the outcomes of 
chemical reactions. Some say that it revolu- 
tionized quantum chemistry, the applica- 
tion of quantum mechanics to the study of 
molecules. 

Kohn, who died on 19 April, was born in 
Vienna in 1923. In 1939, not long after the 
annexation of Austria by Nazi Germany, 
Kohn’s parents sent him to England ona 
convoy of the Kindertransport, an opera- 
tion to rescue Jewish children from Europe 
before the outbreak of the war. His mother 
and father were later killed at Auschwitz. 

In 1940, as a holder ofa German passport, 
Kohn was shipped to the first of what would 
be a series of internment camps in Canada. 
Once free to leave, he began studies at the 
University of Toronto, where he earned a 
bachelor’s degree in mathematics and phys- 
ics and master’s degree in applied mathemat- 
ics. In 1948, he completed a PhD in nuclear 
physics at Harvard University in Cambridge, 
Massachusetts; his supervisor was the 
Nobel-prizewinning theoretical physicist 
Julian Schwinger. 

In 1950, after a short stint of postdoc- 
toral work, Kohn took a professorship at 
the Carnegie Institute of Technology in 
Pittsburgh, Pennsylvania (now Carnegie 
Mellon University). A decade later, he joined 
the physics department at the University of 
California, San Diego, where he worked for 
nearly 20 years before becoming the found- 
ing director of the Institute for Theoretical 
Physics at the University of California, Santa 
Barbara (now the Kavli Institute). 

A condensed-matter system, from a sin- 
gle atom to a living organism, is composed 
of nuclei and electrons. The electrons roam 
in an energy landscape provided by the 
nuclei, and each electron is influenced by 
the others. The electrical charges of any pair 
of electrons in the same energy landscape 
interact, and no electron can exist in the 
same state as another in the same energy 
landscape (the Pauli exclusion principle). 

In the 1950s and 1960s, physicists were 
using two different approaches to compute 
the energy states of electrons in a material. 
In both approaches, the energy landscape 
was thought to be key to the prediction of 
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the properties of a system, including the 
distribution of electron density. The density 
functional theory swapped the cause and 
effect roles between the energy landscape and 
the electron-density distribution. This paved 
a way to compute the properties of functional 
importance to technologies and to life, such 
as electronics and photosynthesis. 

Around 1960, Kohn started to examine 
the change that occurs to the spatial dis- 
tribution of the electron density when an 
impurity is added to a metal. For a posi- 
tively charged impurity, the electrons pile 
up around it as expected. They also exhibit 
a wave-like distribution (Friedel oscilla- 
tions), which reflects a quantum property 
of the electrons. This quantum feature led 
Kohn to examine the possibility that the 
electron density contained the key to other 
properties. In 1964, while on sabbatical in 
Paris, he established with Pierre Hohenberg, 
a postdoc at the Ecole Normale Supérieure, 
the Hohenberg-Kohn density theorem. This 
stated that the electron-density distribution 
(not the energy landscape) determines the 
properties of a many-electron system. 

Returning to San Diego, Kohn prompted 
a postdoc, David Mermin, to generalize the 
theorem so that it could be applied to all 
temperatures. In 1965, he established with 
another postdoc (me) a way to use density 
functional theory to compute the properties 
of materials. 
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Kohn’s PhD student at San Diego, Philip 
Tong, was the first to apply density functional 
theory to infer the electron energies of atoms 
of noble gases and of the sodium lattice. With 
his postdoc, Norton Lang, Kohn applied the 
theory to calculate properties of metal sur- 
faces in the early 1970s. Kohn and Lang won 
the Davisson-Germer prize in 1977 for their 
contribution to surface physics. For his work 
on density functional theory, Kohn shared the 
1998 Nobel Prize in Chemistry. 

For several years after the Hohenberg- 
Kohn theorem was published, theoretical 
chemists raised objections — almost unani- 
mously — to the central role of the electron- 
density distribution. They could prove that a 
more general property known as the density 
matrix was the fount of all electronic proper- 
ties. They thought that the electron-density 
distribution, which was only a component 
of this matrix, could not offer the same 
predictive power. In the end, people were 
persuaded by the simplicity of the proof of 
the theorem, and by the efforts of numerous 
researchers who showed its usefulness. 

Walter was meticulous in his research — 
but in sports he was adventurous. In 1996, 
he wrecked his shoulder skiing the day 
before a widely anticipated talk on density 
functional theory at an annual meeting of 
the American Physical Society, and asked 
me to speak in his stead. He said that he had 
used a mogul to launch a jump, recalling the 
ski jumps he had made asa child in Austria. 
On another occasion, he took his eldest 
daughter, Marilyn, and me sailing beyond 
the surf at La Jolla Shores beach in Califor- 
nia on a windy day. The boat capsized, and 
as we pushed it back towards the beach the 
surf ripped it from our hands. 

Walter cared deeply about social issues. At 
San Diego, he promoted the Judaic studies 
programme. He was also a vocal critic of the 
University of California’s association with 
the national weapons laboratories in Los 
Alamos, New Mexico, and in Livermore, 
California. And he was proud of producing 
a documentary film promoting solar energy. 

Walter was an admired mentor and 
colleague, and will be missed by the many 
who came within his orbit. m 


LuJ. Sham is distinguished professor 
emeritus of physics at University of 
California, San Diego, San Diego, 
California, USA. 

e-mail: Isham @ucsd.edu 
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PLANETARY SCIENCE 


Pluto’s polygons explained 


The Sputnik Planum basin of Pluto contains a sheet of nitrogen ice, the surface of which is divided into choy polygons tens 


of kilometres across. Two studies reveal that vigorous convection causes these polygons. SEE LE 


ANDREW J. DOMBARD & SEAN O’HARA 


withdraw as the Sun climbs above distant 

mountains that rise from below the hori- 
zon. The ice sheet itself is largely featureless, 
with a difference in elevation of only some tens 
of metres over distances of many tens of kilo- 
metres — nothing slows the shadows retreat. 
The sky overhead remains black, and it stays 
chilly, only about 35 degrees above absolute 
zero. This is morning on Sputnik Planum, 
Pluto. 

The fly-by of Pluto (and its satellites) by 
NASA's New Horizons space probe in July 
2015 revealed a spectacular world unlike any 
yet seen’, The informally named Sputnik Pla- 
num is of special interest, a bright, flat-floored 
basin around 1,200 km in diameter that is 
filled mainly with nitrogen ice (Fig. 1). High- 
resolution images’ show a surface separated 
into polygonal cells 10-40 km in diameter, 
pockmarked by pits and fed by flowing nitro- 
gen glaciers from the surrounding highlands. 


[= sunrise on a frozen plane. Shadows 


In this issue, Trowbridge etal.’ (page 79) and 
McKinnon et al.’ (page 82) investigate this 
polygonal terrain and conclude that it is 
continually and quickly resurfaced by convec- 
tion, making it one of the youngest surfaces 
in the Solar System. Pluto therefore joins 
Europa, Enceladus, Titan and Triton as a 
small and icy but geologically dynamic body 
of the outer Solar System — a far cry from 
the cold, dead worlds one might expect so far 
from the Sun. 

The nitrogen ice identified by New Horizons 
is a structurally weak solid with a very low melt- 
ing point (63 kelvin), and should flow viscously 
even at Pluto’s low temperatures*. As in other 
planetary bodies, Plutos interior is warmer than 
its surface because it is heated by the decay of 
long-lived radiogenic isotopes in the rocky com- 
ponent. How this heat escapes through Sputnik 
Planum has consequences for its surface geol- 
ogy. For a layer of weak nitrogen ice at least 
0.5-1 km thick, the most efficient heat-transfer 
mechanism is convection. 

Because the material is heated from the 


bottom, the heat will cause localized thermal 
expansion, making the heated material less 
dense than the rest of the overlying ice. In con- 
vection, the less-dense material is buoyant and 
will rise, carrying its heat content towards the 
surface, where it cools and then sinks. Viscous 
drag resists this buoyancy-driven movement, 
and convection can occur only ifthe buoyancy 
overwhelms the viscous resistance. 

This competition can be quantified. The 
ratio of buoyant to viscous forces in a layer 
defines a dimensionless parameter known as 
the Rayleigh number. If the Rayleigh num- 
ber is greater than a critical value, then the 
material convects. Both Trowbridge et al. and 
McKinnon et al. found that the Rayleigh num- 
ber of the polygonal terrain is several orders of 
magnitude greater than the critical value. Their 
results indicate that the nitrogen ice is vigor- 
ously convecting and that the cellular patterns 
are the tops of convection cells. 

In addition, both groups report that the 
convective flow speeds are in the range of 
centimetres per year, meaning that the surface 


Figure 1 | Sputnik Planum. a, One of Pluto’s youngest terrains, informally known as Sputnik Planum, is a nitrogen ice sheet (visible here as the large pale 
expanse) that fills a topographic basin. b, This composite image showing a closer view of the nitrogen ice reveals irregular polygons about 10,-40 kilometres in 
diameter (upper part of image). Two papers” report that the polygonal terrain is caused by convection in the nitrogen ice. 


40 | NATURE | VOL 534 | 2 JUNE 2016 


© 2016 Macmillan Publishers Limited. All rights reserved. 


SOUTHWEST RES. INST. 


NASA/JOHNS HOPKINS UNIV. APPL. PHYSICS LAB./ 


turns over in about 500,000 to 1 million years. 
This rapid resurfacing explains the lack of 
impact craters on the ice sheet. (In general, 
the older a planetary surface, the more impact 
craters will have formed.) 

Although the two papers report the same 
primary result, they differ in their conclu- 
sions about the convective regime, which 
determines the width-to-depth aspect ratio of 
the convection cells and hence the thickness of 
the nitrogen-ice layer. Trowbridge et al. argue 
that variations in the viscosity of the ice due to 
differences in stress and temperature across the 
layer are small enough that convection occurs 
in the Rayleigh—-Bénard regime, which is char- 
acterized by the formation of cells that have 
widths similar to their depths”. The cell size of 
10-40 km thus implies a layer thickness of at 
least 10 km. 

By contrast, McKinnon ef al. argue that the 
temperature dependence of the nitrogen ice 
causes ‘sluggish lid’ convection, in which the 
viscosity is higher at the cooler surface than in 
the interior®. As the name suggests, this yields 
a slower-moving surface layer and cells that 


MICROBIOLOGY 


are much wider than they are deep, making 
the depth of the layer 3-6 km. The authors 
support this conclusion with numerical mod- 
elling that reproduces convection cells with 
sizes and surface topography that are consist- 
ent with observations. 

The layer thickness has important 
implications for Pluto's geological history. 
On the basis of the shape and ellipticity of the 
basin that holds Sputnik Planum, McKinnon 
and colleagues note that it is most probably an 
ancient impact crater’. From scaling of other 
examples in the Solar System, it is known that 
an impact basin of this size can easily accom- 
modate the depth of nitrogen ice estimated by 
McKinnon eft al., but not the depth estimated 
by Trowbridge and colleagues. Their deeper 
prediction requires a more complicated 
explanation of basin formation and evolution. 
Perhaps the weight of the nitrogen ice caused 
the basin to subside, for example. 

Both papers report that the quantity of ice 
in the basin is equivalent to a global layer 
several hundred metres in depth, commensu- 
rate with Pluto's total budget of nitrogen. But 


Pumping persisters 


The finding that antibiotics are pumped out of drug-tolerant bacterial cells by the 
ToIC protein complex provides insight into how some cells, known as persisters, 
survive in the face of antibiotic treatments. 


KENN GERDES & SZABOLCS SEMSEY 


an antibiotic-sensitive cell population has 

switched to a slow-growing or dormant 
state, and is drug tolerant’. This differs from 
antibiotic resistance in that regrowth of a per- 
sistent population results in the same percent- 
age of drug-sensitive cells as before. Persistence 
has been interpreted as a bet-hedging strategy 
that increases the survival rates of bacte- 
rial populations’, and is medically relevant 
because it might sustain recurrent and chronic 
infections. Writing in Molecular Cell, Pu et al. 
challenge the widespread view that persistence 
is a passive state. The authors demonstrate that 
persister cells use an energy-dependent efflux 
pump protein called TolC to actively reduce the 
intracellular accumulation of antibiotic — a 
finding that might have both fundamental and 
therapeutic relevance. 

Pu and colleagues isolated persisters and 
labelled them with a fluorescent antibiotic 
called BOCILLIN, which is derived from pen- 
icillin. They observed the cells using micros- 
copy and found that the antibiotic could 
penetrate persister cells. However, the aver- 
age antibiotic concentration in the persisters 


le bacterial persistence, a small fraction of 


was about 20% of that in the drug-sensitive 
population. 

TolC is the outer-membrane component of 
a family of efflux pumps that can move small 
molecules out of the cell from both the cyto- 
plasm and the periplasmic space between the 


Drug-sensitive 


Antibiotic 
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neither satisfactorily addresses how so much 
of the nitrogen budget could have collected 
there — was it for climatological reasons, as 
Trowbridge and co-workers speculate, or for 
glaciological reasons, as McKinnon et al. sug- 
gest? Clearly, this localization of nitrogen was 
a major event in Pluto’s evolution that needs 
to be explored. Fortunately, New Horizons 
continues to transmit data from its Pluto 
encounter back to Earth. It is to be hoped that 
these two papers will be the first step towards 
a deeper understanding of this distant world. m 


Andrew J. Dombard and Sean O’Hara are in 
the Department of Earth and Environmental 
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inner and outer bacterial membranes. Using 
sophisticated microfluidics combined with 
fluorescence microscopy, Pu et al. showed 
that TolC is responsible for the rapid export 
of BOCILLIN from persisters (Fig. 1). An 
alternative explanation could be that less 
of the antibiotic is taken up into cells in the 
first place, but the authors found that lower 
membrane permeability owing to depletion 
of porin proteins only slightly decreased 
BOCILLIN uptake. These observations raised 
the possibility that increased TolC levels con- 
tribute to drug tolerance in persisters. 

Next, an analysis of cells in which TolC was 
labelled with a fluorescent dye called FlIAsH 
revealed that persisters do have higher TolC 
levels than the drug-sensitive subpopulation. 


Persister 


Figure 1 | A subpopulation in efflux. Persistence is a phenomenon whereby a small fraction of cells in 
a bacterial population survive antibiotic treatment. Pu et al.” demonstrate that persister cells upregulate 
production of the TolC protein relative to drug-sensitive cells. TolC is part of a membrane-spanning 
efflux pump that transports antibiotic out of the cell, thus promoting survival. 
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Moreover, when the authors isolated the sub- 
population of cells with relatively high TolC 
levels, they found that this fraction contained 
nearly 20 times more persisters than the rest of 
the population. Thus, there is a clear correla- 
tion between a high level of TolC and persis- 
tence. The researchers then tracked cells using 
the FIAsH-labelled TolC: these experiments 
suggested that most persister cells emerged 
from a subpopulation that had increased 
levels of TolC even before treatment with the 
antibiotic. This important result warrants fur- 
ther study, and raises the question of whether 
the molecular mechanism that underlies the 
drug-independent variation of TolC is sepa- 
rate from, or an integral component of, other 
pathways that are already known to regulate 
stochastically induced persistence. 

The current study leaves little doubt that 
TolC is involved in persistence. Most convinc- 
ingly, perhaps, Pu and colleagues showed that 
deletion of the tolC gene or inhibition of TolC 
with a chemical compound drastically reduced 
the level of persisters. Because TolC is an outer- 
membrane protein, such inhibitors can read- 
ily access the protein. These observations 
raise the question of whether it might, in the 
future, be possible to develop therapeutic co- 
drugs that increase the efficacy of conventional 
antibiotics. These could be particularly useful 
for treating chronic and recurrent infections. 

Many other genes have previously been 
implicated in bacterial persistence, including 
toxin—antitoxin (TA) genes. Most type II TA 
genes encode inhibitors of translation — their 
expression might therefore contribute to the 
dormancy of persisters~*. Indeed, deletion of 
several type II TA genes significantly reduces 
persistence in the bacterium Escherichia coli® 
and ina subspecies of Salmonella enterica’. The 
small membrane proteins encoded by type I 
TA genes can also induce persistence, by depo- 
larizing the membrane, thereby reducing cellu- 
lar levels of the energy-carrying molecule ATP 
and thus contributing to dormancy’. 

Expression of both type I and II TAs is 
induced stochastically by the signalling mol- 
ecules guanosine tetra- and pentaphosphate, 
and so the two classes might contribute syn- 
ergistically to dormancy by reducing ATP 
levels and protein synthesis, respectively. 
How could the stochastic variation of TolC 
levels observed by Pu et al. fit into the regula- 
tory scheme that controls type I and II TAs? 
Expression of the tolC gene is regulated by 
several transcriptional activators that respond 
to chemical compounds, including antibiot- 
ics, but if expression is also induced stochas- 
tically before chemical stress, TolC might act 
in concert with type I and II TAs to increase 
the drug tolerance of persisters. This would 
be the first example of an active mecha- 
nism contributing to stochastically induced 
multiple-drug tolerance. More research is 
required to resolve this exciting, outstanding 
question. m 
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A photo shoot of 
plant photosystem II 


In photosynthesis, the plant photosystem II uses the energy in sunlight to oxidize 
water. The high-resolution structure of this crucial supercomplex has now been 
obtained using cryo-electron microscopy. SEE ARTICLE P.69 


ROBERTA CROCE & PENGQI XU 


hotosystem II is the enzyme complex 
that produces the oxygen we breathe. It 


is at the heart of the photosynthesis pro- 
cess, and uses the energy of the Sun to extract 
from water the electrons and protons that are 
needed to produce food and fuel. On page 69 
of this issue, Wei et al.' report the structure of 
spinach photosystem I] — a 1.1-megadalton 
dimeric complex in which each monomer is 
composed of 25 proteins and 133 pigment 
molecules. This structure provides a plethora 
of information to aid our understanding of the 
molecular mechanisms by which light is con- 
verted into chemical energy. 

Photosystem II (PSII) is a membrane- 
embedded modular assembly of pigment- 
protein complexes and is composed of two 
main parts, the core and the outer antenna. 
The core contains the reaction centre in which 
energy is used to drive photochemistry. It has 
an evolutionarily highly conserved protein 
composition in all the organisms that perform 
oxygen-generating photosynthesis’. 

Wei and colleagues’ structure shows that 
both the protein and the pigment organiza- 
tion of the plant PSII core in the membrane 
region are almost identical to those of the core 
of the previously reported’ structure of cyano- 
bacterial PSII; this indicates that the complex 
was optimized long ago and has not changed 
since. Only the peripheral membrane proteins 
that surround the water-splitting catalyst in the 
core are organized differently in plants and 
cyanobacteria. This observation is intriguing, 
because these peripheral proteins are needed 
for oxidizing water’, and their organizational 
differences from their cyanobacterial counter- 
parts might inform how nature has optimized 
this essential reaction. 
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PsbW 


Core 


Figure 1 | Structure of the plant photosystem II 
supercomplex. Wei et al.’ report the structure of 
spinach photosystem II as a dimeric supercomplex. 
(Here, one of the monomers is represented as 
ribbons and the other as its surface area.) Of 

the two main parts of each monomer, the core 
complex contains the reaction centre (not shown), 
and the peripheral light-harvesting complexes 
(LHCII, CP29 and CP 26) supply excitation energy 
to the reaction centre. PsbW is a core subunit 
typical of plants, which mediates the association of 
LHCII with the core. 


In contrast to the core, the outer antenna 
differs greatly between photosynthetic organ- 
isms. Its role is to increase the core’s capacity 
to harvest light, overcoming the fact that light 
is a dilute form of energy (one molecule of 
the pigment chlorophyll absorbs only a few 
photons per second even on a bright, sunny 
day). The outer antenna is shaped by the host 


organism's adaptation to its ecological niche, 
where light quantity and quality vary, and thus 
it is tailor-made’. In vascular plants (includ- 
ing spinach), the outer antenna is composed of 
light-harvesting complexes (LHCs). These are 
pigment—protein complexes that absorb light 
and transfer part of the corresponding energy 
to the reaction centre. 

In the present structure, each monomer of 
the PSI supercomplex is composed of a core, 
one LHC trimer (called LHCII; ref. 6) and one 
monomer of each of two minor LHCs, CP29 
and CP26 . Wei and colleagues’ work offers the 
first structures of the plant PSH core, CP26 and 
the complete CP29. It also shows the position 
of the core subunit PsbW and the molecular 
details of the connection between the antenna 
and the core (Fig. 1). Notably, the long amino- 
terminal region of CP29 extends all the way 
over the CP47 subunit of the core to interact 
with the D1 protein of the reaction centre. This 
organization provides a structural basis for the 
observation’ that, in plants, LHCs are required 
for the connectivity in the core. 

A substantial problem in obtaining 
structural information about plant PSII has 
been its instability, which is a direct conse- 
quence of its functional behaviour. Not only 
must PSII adjust its antenna size in response 
to natural lighting conditions, but it must also 
repair its reaction centre; splitting water using 
sunlight is a risky business that can lead to the 
generation of reactive oxygen species, which 
damage the system. To repair the damage, PSII 
undergoes regular ‘pit stops, during which it 
is disassembled and reassembled after sub- 
stitution of the damaged part*. Cryo-electron 
microscopy was therefore crucial in solving this 
structure, because it allowed Wei et al. to select 
the intact particles from the ensemble and reach 
an amazing 3.2-angstrém resolution. 

Although modularity is required to allow 
repair, a good connection between the subunits 
is equally important for efficient energy transfer 
from the antenna to the core. The absorption 
of a photon promotes chlorophyll to an excited 
state, but this state is unstable and the chloro- 
phyll relaxes to its initial (ground) state within 
a few nanoseconds. Once back to the ground 
state, the energy is lost. Consequently, time is 
limited for using the energy, which must then be 
rapidly transferred to the reaction centre. 

Earlier work showed’ that in the plant PSII 
supercomplex, it takes 140 picoseconds (1 ps 
is 10°’) from the absorption ofa photon by a 
chlorophyll molecule to the charge separation 
in the reaction centre, such that more than 90% 
of the photons absorbed produce an electron. 
How fast this energy is transferred from the 
antenna to the reaction centre depends on the 
distance between the pigments, their relative 
orientation and their energy. These factors are 
all dictated by the proteins, which act as smart 
matrices that organize the pigments. 

The LHCs contain two types of chlorophyll 
(a and b) that are chemically similar but 


energetically different. Chlorophyll a absorbs 
lower-energy photons than chlorophyll b. 
Because energy preferentially migrates down- 
stream, chlorophyll b rapidly transfers it to chlo- 
rophyll a; the energy is then transferred to the 
reaction centre mainly by chlorophyll a. 
Although detailed computational 
modelling based on the new structure is 
needed for a quantitative understanding of 
the excitation-energy transfer, visual inspec- 
tion of how chlorophylls are organized in the 
supercomplex already provides qualitative 
indications. Intriguingly, the interface between 
the LHCs is occupied by chlorophyll b mol- 
ecules located relatively far from each other, 
suggesting that there is little (if any) transfer 
of energy between the LHCs present in this 
supercomplex. Instead, all LHCs seem to trans- 
fer their collected energy directly to the core. 
This organization may seem at odds with 
the required efficiency, because more energy- 
transfer pathways normally result in higher 
efficiency’. In cells, however, PSII is in contact 
with other LHCs, the number of which varies 
under different conditions and which are not 
present in the isolated supercomplex””. The 
presence of chlorophyll a at the periphery of 
the supercomplex can help to transfer energy 
from such additional LHCs to the core. 
Clearly, Wei et al. have provided us with 
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a wonderful structure. The ball is now in 
the court of spectroscopists and theoreti- 
cians to use this structure to obtain a detailed 
understanding of the functionality of 
the system. m 
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Neanderthals built 
underground 


The finding of 175,000-year-old structures deep inside a cave in France suggests 
that Neanderthals ventured underground and were responsible for some of the 
earliest constructions made by hominins. SEE LETTER P.111 


MARIE SORESSI 


activity, and some ancient constructions 

remain majestic to this day. However, all 
too often, constructions made by mobile popu- 
lations do not preserve well. Evidence for struc- 
tures made by prehistoric hunter-gatherers are 
scarce and usually consist only of an area of finds 
with intriguing spatial distributions, which may 
be associated with a fireplace. On page 111 of 
this issue, Jaubert et al.' report the discovery of 
circular structures made of broken stalagmites 
deep inside a cave in southwest France. The 
structures are up to 40 centimetres high and 
6.7 metres wide, and direct radiometric dating 
shows that they are at least 175,000 years old. 
Because Neanderthals were the only hominin 
group present in western Europe at that time, 
the discovery provides the first directly dated 


B uilding is a frequent by-product of human 
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evidence for Neanderthals’ construction abili- 
ties. It also shows that Neanderthals explored 
underground. 

Neanderthals lived in Eurasia from around 
400,000 to 40,000 years ago, at which point 
anatomically modern humans settled in. 
Investigation of the archaeological record from 
the Late Pleistocene epoch — which spanned 
from 126,000 to 11,700 years ago — has pro- 
vided robust data on the behaviour of ancient 
hominins and allowed a comparison of the 
activities of Neanderthals and early modern 
humans. This comparative approach has been 
regularly used to elaborate on the reasons for 
Neanderthals’ demise and the success of early 
modern humans. 

However, given a lack of direct evidence, 
there has been little discussion of the construc- 
tional abilities of Neanderthals. It is known 
that great apes, birds and other animals build 
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Figure 1 | Ancient structures. Circular structures made from broken stalagmites, found in Bruniquel 


Cave in southwest France by Jaubert et al.', are thought to have been made by Neanderthals around 


175,000 years ago. 


elaborate nests (the bowerbird is a famous 
example), and the archaeological record 
contains examples of constructions made 
by anatomically modern humans about 
20,000 years ago, such as collapsed, rounded 
‘ruins’ made from mammoth bones or deer 
antlers’. Yet only a few structures interpreted 
as post-holes or isolated elements of dry 
stone walls have been tentatively attributed 
to Neanderthals. Furthermore, differential 
distributions of finds inside and outside poten- 
tial Neanderthal constructions have rarely 
been documented, and even then not always 
convincingly*. 

Jaubert et al. report accumulations of 
almost 400 stalagmites and stalagmite 
fragments stacked into several structures, 
including two that have a semicircular 
shape, some 300 m from the entrance of 
Bruniquel Cave (Fig. 1). One semicircu- 
lar structure, which is more than 6.7 m 
wide, comprises a ‘wall’ made of up to 
four superimposed layers of stalagmite 
fragments about 30cm in length, with smaller 
elements stuck obliquely in between. Red- 
dening, blackening and cracking of many 
stalagmites suggest that the structures have 
been heated by small fires. The authors also 
recovered a 6.7-cm fragment of heated bone 
from within one of the smaller structures, 
close to reddened and blackened stalagmites. 
This find, together with measurement of the 
magnetic anomalies in the rock above and 
around the structures, supports the idea that 
the structures were heated. 

The researchers used molecular and 
atomic spectrometry to investigate two 
other probable residues of heated bones, one 
found in a 2-m-wide structure and the other 
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forming part of a concentration of similarly 
blackened material discovered on the ground 
and interpreted as a hearth. Seven stalagmites 
from the two largest structures were dated 
using uranium-series measurements; by 
dating the calcite that had grown before and 
after the fragments were broken, the research- 
ers could constrain the date at which the 
stalagmites were used in construction. The cal- 
cite covering the 6.7-cm-long bone and form- 
ing the flowstone (a sheet-like calcite deposit) 
on the floor of the largest structure was 
also dated. 

Altogether, the authors dated 18 samples 
from the area containing the structures, 
which show that the structures are around 
176,500 years old (with a confidence inter- 
val of 2,100 years). That period is known to 
have had relatively warm and humid phases, 
which is consistent with the calcite deposition 
observed. The signal of oxygen and carbon 
isotopes reported from the stalagmites is also 
consistent with the atmospheric conditions 
known for that time. 

The inner organization and the size of the 
structures do not fit with what is documented 
for the nests of cave bears, discounting that 
possibility for their construction. Thus, these 
structures are the oldest directly dated con- 
structions attributed to Neanderthals, and the 
first ones for which we can be confident of that 
attribution. Furthermore, no charred materials 
have been found outside the structure, and no 
red- or black-coloured material was observed 
on the cave ceiling above the structure: these 
details support the idea that the colorations 
are indicative of heating in situ and were not 
transported between or onto the stalagmites 
by natural processes. 
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Jaubert et al. discuss the social organization 
that would have been needed to manufac- 
ture such structures, and compare this with 
what is known for modern humans from the 
same era. They conclude that their discovery 
indicates that Neanderthals exhibited more- 
complex social behaviour than was previously 
thought, and suggests that these hominins used 
the underground environment. Only further 
discovery of underground structures will help 
to establish whether these structures were 
opportunistic ones relating to an accidental 
underground visit, or whether they were part 
of regular and planned Neanderthal activities. 

These structures are among the best- 
preserved constructions known for the whole 
of the Pleistocene epoch, probably because 
they were sealed by calcite very soon after 
they were erected. When the best evidence is 
found in the best-preserved context, it serves 
as a reminder for archaeologists of how much 
we depend on preservation. The fact that some 
of the art of the period is also often found deep 
inside caves has been alternatively interpreted 
as a testimony of the preservation provided by 
the cave environment’ or as a result of spir- 
itual preoccupations — the underground 
being a special place*. Perhaps we need to fur- 
ther consider the idea that the fuzziness of the 
Neanderthal record is due to a lack of preserva- 
tion. Given that we often discuss archaeologi- 
cal findings in a comparative framework that 
contrasts Neanderthals (which disappeared) 
with early modern humans (who were obvi- 
ously successful), we may also wonder how 
this framework is biased by Western thought. 
European culture is known for having empha- 
sized what may be ‘uniquely human’ and may 
separate ‘us’ from other animals. 

Comparing hominins across a large chunk 
of time is necessary and useful. However, an 
increased focus on reconstructing the his- 
torical context of past behavioural and tech- 
nological innovations may be key to further 
understanding these different populations. 
The structures discovered by Jaubert et al. are 
a good example of how reconstructing ancient 
history may benefit from not only broad-scale 
comparisons of evolution over time but also 
detailed analysis of specific areas at specific 
time points. 
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Earth’s core problem 


Measurements of the electrical resistance and thermal conductivity of iron at 
extreme pressures and temperatures cast fresh light on controversial numerical 
simulations of the properties of Earth’s outer core. SEE LETTERS P.95 & 99 


DAVID DOBSON 


arth’s core acts like a storage heater, with 
Be released during crystallization 

of the inner core that buffers the slow 
cooling of the planet as it radiates its heat to 
space. The most obvious expression of this heat 
transfer is Earth's magnetic field, which is gen- 
erated by convection in the liquid outer core. 
But the magnitude of the transfer is controlled 
by thermal conduction across the boundary 
between the core and mantle. 

In 2012, first-principles numerical simula- 
tions'” indicated that the thermal conductivity 
of liquid iron in the outer core is so high that 
this region might act as a pump that pushes 
heat towards the core-mantle boundary faster 
than convection can. If, as these controversial 
studies suggest, the core is losing heat at such 
a high rate, it means that the magnetic field 
must work in previously unimagined ways’, 
and that the solid inner core must be less than 
a billion years old’ — a mere babe in planetary 
terms. In this issue, Ohta et al.° (page 95) and 
Konépkové et al.” (page 99) report studies that 
experimentally tested the simulations’ results 
using complementary, but distinct, approaches 
and come to different conclusions. 

Both groups use laser-heated diamond-anvil 
cells to generate the extreme temperatures and 
pressures of the core-mantle boundary, but 
that is where the similarity ends. Ohta et al. 
measured the electrical resistance of iron wires, 
which is closely related to the wires’ thermal 
conductivity (Fig. 1a). To convert the resistiv- 
ity measurements to a measure of the thermal 
conductivity of liquid iron in the outer core, the 
authors fitted their data to a model of resistivity 
that assumes that resistance approaches a limit 
at high temperature (a phenomenon called 
resistivity saturation). This then allowed them 
to use the Wiedemann-Franz relationship 
between resistance and thermal conduction in 
metals to calculate the thermal conductivity. 
Both of these procedures have good theoretical 
bases and are well established for low-pressure 
observations. The observed high electrical 
conductivities resulted in a predicted outer- 
core thermal conductivity of around 90 watts 
per metre per kelvin, which is in reasonable 
agreement with the 2012 simulations’. 

By contrast, Konépkova et al. directly 
measured thermal conduction by watching 
a heat pulse propagate through a solid iron 
sample after heating with a nanosecond laser 


Thermal 
diffusivity 


Diamond 


Laser heating 
Electrodes 


Electrical 
resistance 


Figure 1 | Measuring the thermal conductivity 
of iron at Earth’s core conditions. In diamond 
anvil cells, the pressure generated between 

the tips of diamonds can exceed millions of 
atmospheres. Lasers can be fired through the 
diamonds to directly heat a sample of a material 
to 4,000 kelvin or more. a, Ohta et al.” connected 
electrodes to a sample of solid iron and measured 
its electrical resistance (which is inversely 
proportional to thermal conductivity in metals) 
at high temperatures and pressures. b, In separate 
experiments, Kondépkové et al.° pulsed the laser, 
and measured the time taken for heat pulses to 
diffuse through a solid iron sample on the basis of 
changes in the brightness and wavelength of the 
light emitted from the sample. This allowed them 
to measure the thermal rate of diffusion, which is 
closely related to thermal conductivity. 


pulse (Fig. 1b). The time taken for the pulse 
to pass from the heated side of the sample to 
the other side, and the amplitude difference of 
the pulse between the two sides, are functions 
of the thermal conductivity of the sample, as 
well as of the surrounding solid medium that 
transmits pressure from the diamonds to the 
sample and thermally insulates the sample 
from the diamonds. After some careful math- 
ematical modelling of the temperature field 
in the diamond cell, the authors extracted 
the thermal conductivity of iron from time- 
resolved changes in the brightness and wave- 
length of the glow from the white-hot sample. 
They obtained a thermal conductivity of about 
30 Wm 'K |, similar to early predictions of 
outer-core conductivity’. 
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But this leaves us with a conundrum: how 
to reconcile the high thermal conductivity 
reported by Ohta and colleagues on the basis of 
resistance measurements with the low thermal 
conductivity measured by Konépkova and co- 
workers. Maybe there were unknown complica- 
tions with the experiments? For example, the 
extremely short laser pulses used by Konépkova 
et al. might have caused the sample to partially 
melt for a short period, which could have gone 
unnoticed during the experiment. Ifso, then the 
melting phase transition would have acted as a 
thermal buffer (much as the crystallization of 
the inner core buffers Earth’s temperature) and 
caused an apparent decrease in thermal con- 
ductivity. This might explain why the measured 
thermal conductivities decrease so strongly 
with temperature, particularly at temperatures 
approaching the melting temperature. 

Or maybe Ohta et al. underestimated 
the heat loss through the electrodes in their 
experiments, which would mean that the 
average sample temperature was less than the 
measured value. This could have made it look 
as though resistivity was saturating, even if it 
wasnt. Alternatively, the proportionality con- 
stant between electrical resistance and ther- 
mal conduction (the Lorenz number) might 
become strongly temperature dependent at 
the extreme pressures and temperatures of the 
experiment — this would point to previously 
unobserved fundamental physics. 

Despite the discrepancy, these two studies 
are experimental feats, measuring complex 
physical properties of samples smaller than 
a pinhead at pressures greater than 1 million 
atmospheres, and at temperatures above 
4,000 K. The fact that the results agree within a 
factor of three is a remarkable success, but the 
devil is in the detail. The discrepancy makes a 
big difference to estimates of when the inner 
core formed, and hence when Earth gener- 
ated a stable magnetic field — the inner core 
could be as little as 700 million years old, about 
the same age as complex life; or as much as 
3 billion years old, about three-quarters of 
Earth’s age. More experimental and theoretical 
work is needed to resolve the discrepancy and 
hence to constrain the age of the inner core and 
the workings of Earth’s magnetic field. m 
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Landscape of somatic mutations in 560 
breast cancer whole-genome sequences 
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We analysed whole-genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring 
clonal advantage and the mutational processes generating somatic mutations. We found that 93 protein-coding cancer 
genes carried probable driver mutations. Some non-coding regions exhibited high mutation frequencies, but most have 
distinctive structural features probably causing elevated mutation rates and do not contain driver mutations. Mutational 
signature analysis was extended to genome rearrangements and revealed twelve base substitution and six rearrangement 
signatures. Three rearrangement signatures, characterized by tandem duplications or deletions, appear associated with 
defective homologous-recombination-based DNA repair: one with deficient BRCA1 function, another with deficient 
BRCAI or BRCA2 function, the cause of the third is unknown. This analysis of all classes of somatic mutation across 
exons, introns and intergenic regions highlights the repertoire of cancer genes and mutational processes operating, and 
progresses towards a comprehensive account of the somatic genetic basis of breast cancer. 


The mutational theory of cancer proposes that changes in DNA somatic cells during the lifetime of the cancer patient, together with 
sequence, termed ‘driver’ mutations, confer proliferative advan- many ‘passenger’ mutations not implicated in cancer development!. 
tage on a cell, leading to outgrowth of a neoplastic clone'. Some Multiple mutational processes, including endogenous and exoge- 
driver mutations are inherited in the germline, but most arise in nous mutagen exposures, aberrant DNA editing, replication errors 
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Figure 1 | Cohort and catalogue of somatic mutations in 560 breast 
cancers. a, Catalogue of base substitutions, insertions/deletions, 
rearrangements and driver mutations in 560 breast cancers (sorted by 

total substitution burden). Indel axis limited to 5,000(*). b, Complete list 
of curated driver genes sorted by frequency (descending). Fraction of ER- 
positive (left, total 366) and ER-negative (right, total 194) samples carrying 
a mutation in the relevant driver gene presented in grey. logio P value of 
enrichment of each driver gene towards the ER-positive or ER-negative 
cohort is provided in black. Highlighted in green are genes for which there 
is new or further evidence supporting these as novel breast cancer genes. 


and defective DNA maintenance, are responsible for generating 
these mutations!~?. 

Over the past five decades, several waves of technology have advanced 
the characterization of mutations in cancer genomes. Karyotype 
analysis revealed rearranged chromosomes and copy number 
alterations. Subsequently, loss of heterozygosity analysis, hybridization 
of cancer-derived DNA to microarrays and other approaches provided 
higher resolution insights into copy number changes*®. Recently, DNA 
sequencing has enabled systematic characterization of the full reper- 
toire of mutation types including base substitutions, small insertions/ 
deletions, rearrangements and copy number changes”~'’, yielding 
substantial insights into the mutated cancer genes and mutational 
processes operative in human cancer. 

As for many cancer classes, most currently available breast cancer 
genome sequences target protein-coding exons®!!15. Consequently, 
there has been limited consideration of mutations in untranslated, 
intronic and intergenic regions, leaving central questions pertaining 
to the molecular pathogenesis of the disease unresolved. First, the role 
of activating driver rearrangements'*'* forming chimaeric (fusion) 
genes/proteins or relocating genes adjacent to new regulatory regions 
as observed in haematological and other malignancies'®. Second, the 
role of driver substitutions and indels in non-coding regions of the 
genome*”!, Common inherited variants conferring susceptibility to 
human disease are generally in non-coding regulatory regions and 
the possibility that similar mechanisms operate somatically in cancer 
was highlighted by the discovery of somatic driver substitutions in the 
TERT gene promoter”””?. Third, which mutational processes generate 
the somatic mutations found in breast cancer®**. Addressing this 
question has been constrained because exome sequences do not inform 
on genome rearrangements and capture relatively few base substitu- 
tion mutations, thus limiting statistical power to extract the mutational 
signatures imprinted on the genome by these processes”*”?. 

Here we analyse whole-genome sequences of 560 cases in order to 
address these and other questions and to pave the way to a compre- 
hensive understanding of the origins and consequences of somatic 
mutations in breast cancer. 


Cancer genes and driver mutations 
The whole genomes of 560 breast cancers and non-neoplastic 
tissue from each individual (556 female and 4 male) were 
sequenced (Supplementary Fig. 1, Supplementary Table 1). 
We detected 3,479,652 somatic base substitutions, 371,993 small indels 
and 77,695 rearrangements, with substantial variation in the number 
of each between individual samples (Fig. 1a, Supplementary Table 3). 
Transcriptome sequence, microRNA expression, array-based copy num- 
ber and DNA methylation data were obtained from subsets of cases. 
To identify new cancer genes, we combined somatic substitutions 
and indels in protein-coding exons with data from other series!2-156, 
constituting a total of 1,332 breast cancers, and searched for mutation 
clustering in each gene beyond that expected by chance. Five cancer 
genes were found for which evidence was previously absent or equivocal 
(MED23, FOXP1, MLLT4, XBP1, ZFP36L1), or for which the muta- 
tions indicate the gene acts in breast cancer in a recessive rather than in 
a dominant fashion, as previously reported in other cancer types (see 
Supplementary Methods section 7.4 for detailed descriptions). From 
published reports on all cancer types (http://cancer.sanger.ac.uk/census), 
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we then compiled a list of 727 human cancer genes (Supplementary 
Table 12). On the basis of driver mutations found previously, we 
defined conservative rules for somatic driver base substitutions and 
indel mutations in each gene and sought mutations conforming to 
these rules in the 560 breast cancers. We identified 916 probable driver 
mutations of these classes (Fig. 1b, Supplementary Table 14, Extended 
Data Fig. 1). 

To explore the role of genomic rearrangements as driver muta- 
tions'®'*°7, we sought predicted in-frame fusion genes that might 
create activated, dominant cancer genes. We identified 1,278 unique 
and 39 infrequently recurrent in-frame gene fusions (Supplementary 
Table 15). Many of the latter, however, were in regions of high 
rearrangement density, including amplicons”® and fragile sites, and 
their recurrence is probably attributable to chance?’. Furthermore, 
transcriptome sequences from 260 cancers did not show expression of 
these fusions and generally confirmed the rarity of recurrent in-frame 
fusion genes. By contrast, recurrent rearrangements interrupting the 
gene footprints of CDKN2A, RB1, MAP3K1, PTEN, MAP2K4, ARID 1B, 
FBXW7, MLLT4 and TP53 were found beyond the numbers expected 
from local background rearrangement rates, indicating that they con- 
tribute to the driver mutation burden of recessive cancer genes. Several 
other recurrently rearranged genomic regions were observed, including 
dominantly acting cancer genes ETV6 and ESRI (without consistent 
elevation in expression levels), L1 retrotransposition sites”® and fragile 
sites. The significance of these recurrently rearranged regions remains 
unclear (Extended Data Fig. 2). 

Incorporation of recurrent copy number changes, including homozy- 
gous deletions and amplifications, generated a total of 1,628 likely 
driver mutations in 93 cancer genes (Fig. 1b). At least one driver was 
identifiable in 95% of cancers. The 10 most frequently mutated genes 
were T'P53, PIK3CA, MYC, CCND1, PTEN, ERBB2, ZNF703/FGFR1 
locus, GATA3, RB1 and MAP3K]1 (Fig. 1b, Extended Data Fig. 1), and 
these accounted for 62% of drivers. 


Recurrent somatic mutations in non-coding regions 

To investigate non-coding somatic driver substitutions and indels, we 
searched for non-coding genomic regions with more mutations than 
expected by chance (Fig. 2a, Supplementary Table 16, Extended Data Fig. 3). 
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The promoter of PLEKHS1 exhibited recurrent mutations at two 
genomic positions*’ (Fig. 2a) TTTTGCAAT TGAACA ATTGCAAAA 
(as previously reported*°). The two mutated bases, within a 6 base pair 
(bp) core motif, are flanked, on either side by 9 base pairs of palin- 
dromic sequence forming inverted repeats*!. Most cancers with these 
mutations showed many base substitutions of mutational signatures 2 
and 13 that have been attributed to activity of APOBEC DNA-editing 
proteins that target the TCN sequence motif. One of the mutated bases 
is a cytosine in a TCA sequence context (shown above as the reverse 
complement, TGA) at which predominantly C>T substitutions were 
found. The other is a cytosine in ACA context, which showed both 
C>T and C>G mutations. 

The TGAACA core sequence was mutated at the same two posi- 
tions at multiple locations elsewhere in the genome (Supplementary 
Table 16c) where the TGAACA core was also flanked by palindromes 
albeit of different sequences and lengths (Supplementary Table 16c). 
These mutations were also usually found in cancers with many sig- 
nature 2 and 13 mutations (Fig. 2a). TGAACA core sequences with 
longer flanking palindromes generally exhibited a higher mutation rate, 
and TGAACA sequences flanked by 9 bp palindromes exhibited an 
~265-fold higher mutation rate than sequences without them (Fig. 2b, 
Supplementary Table 16d). However, additional factors must influence 
the mutation rate because it varied markedly between TGAACA core 
sequences with different palindromes of the same length (Fig. 2c). 
Some TGAACA-inverted repeat sites were in regulatory regions but 
others were intronic or intergenic without functional annotation 
(examples in Supplementary Table 16c) or exonic. The propensity for 
mutation recurrence at specific positions in a distinctive sequence motif 
in cancers with numerous mutations of particular signatures renders it 
plausible that these are hypermutable hotspots**, perhaps through 
formation of DNA hairpin structures*, which are single-stranded at 
their tips enabling attack by APOBEC enzymes, rather than driver 
mutations. 

Two recurrently mutated sites were also observed in the promoter 
of TBC1D12 (TBC1 domain family, member 12) (q value 4.5 x 1077) 
(Fig. 2a). The mutations were characteristic of signatures 2 and 13 and 
enriched in cancers with many signature 2 and 13 mutations (Fig. 2a). 
The mutations were within the TBC1D12 Kozak consensus sequence 
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Figure 2 | Non-coding analyses of breast cancer genomes. 
a, Distributions of substitution (purple dots) and indel (blue dots) 
mutations within the footprint of five regulatory regions identified as 
being more significantly mutated than expected is provided on the left. 
The proportion of base substitution mutation signatures associated with 
corresponding samples carrying mutations in each of these non-coding 
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regions, is displayed on the right. b, Mutability of TGAACA/TGTTCA 
motifs within inverted repeats of varying flanking palindromic sequence 
length compared to motifs not within an inverted repeat. c, Variation in 
mutability between loci of TGAACA/TGTTCA inverted repeats with 9 bp 
palindromes. 
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Figure 3 | Extraction and contributions of base substitution signatures 
in 560 breast cancers. a, Twelve mutation signatures extracted using non- 
negative matrix factorization. Each signature is ordered by mutation class 
(C>A/G>T, C>G/G>C, C>T/G>A, T>A/A>T, T>C/A>G, T>G/A 
>C), taking immediate flanking sequence into account. For each class, 
mutations are ordered by 5’ base (A, C, G, T) first before 3’ base (A, C, 
G, T). b, The spectrum of base substitution signatures within 560 breast 
cancers. Mutation signatures are ordered (and coloured) according to 
broad biological groups: signatures 1 and 5 are correlated with age of 
diagnosis; signatures 2 and 13 are putatively APOBEC-related; signatures 6, 
20 and 26 are associated with mismatch-repair deficiency; signatures 3 
and 8 are associated with homologous-recombination deficiency; 
signatures 18, 17 and 30 have unknown aetiology. For ease of reading, this 
arrangement is adopted for the rest of the manuscript. Samples are ordered 
according to hierarchical clustering performed on mutation signatures. 
Top, absolute numbers of mutations of each signature in each sample. 
Bottom, proportion of each signature in each sample. c, Distribution of 
mutation counts for each signature in relevant breast cancer samples. 
Percentage of samples carrying each signature provided above each 
signature. 


(CCCCAGATGGTGGG)), shifting it away from the consensus**. The 
association with particular mutational signatures suggests that these 
may also be in a region of hypermutability rather than drivers. 
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The WDR74 promoter showed base substitutions and indels 
(q value 4.6 x 103) forming a cluster of overlapping mutations?” 
(Fig. 2a). Coding sequence driver mutations in WDR74 have not been 
reported. No differences were observed in WDR74 transcript levels 
between cancers with WDR74 promoter mutations compared to those 
without. Nevertheless, the pattern of this non-coding mutation cluster, 
with overlapping and different mutation types, is more compatible with 
the possibility of the mutations being drivers. 

Two long non-coding RNAs, MALATI (q value 8.7 x 10~'', as previ- 
ously reported!”) and NEATI (q value 2.1 x 10~*) were enriched with 
mutations. Transcript levels were not significantly different between 
mutated and non-mutated samples. Whether these mutations are driv- 
ers or result from local hypermutability is unclear. 


Mutational signatures 

Mutational processes generating somatic mutations imprint particu- 
lar patterns of mutations on cancer genomes, termed signatures””*7, 
Applying a mathematical approach”® to extract mutational signa- 
tures previously revealed five base-substitution signatures in breast 
cancer: signatures 1, 2, 3, 8 and 13 (refs 2, 24). Using this method for 
the 560 cases revealed twelve signatures, including those previously 
observed and a further seven, of which five have formerly been detected 
in other cancer types (signatures 5, 6, 17, 18 and 20) and two are new 
(signatures 26 and 30) (Fig. 3a, b, 4a, Supplementary Table 21a-c, 
Supplementary Methods section 15). Two indel signatures were also 
found?4, 

Signatures of rearrangement mutational processes have not previ- 
ously been formally investigated. To enable this we adopted a rear- 
rangement classification incorporating 32 subclasses. In many cancer 
genomes, large numbers of rearrangements are regionally clustered, for 
example in zones of gene amplification. Therefore, we first classified 
rearrangements into those inside and outside clusters, further subclassi- 
fied them into deletions, inversions and tandem duplications, and then 
according to the size of the rearranged segment. The final category in 
both groups was interchromosomal translocations. 

Application of the mathematical framework used for base substitu- 
tion signatures”**”> extracted six rearrangement signatures (Fig. 4b, 
Supplementary Table 21). Unsupervised hierarchical clustering on the 
basis of the proportion of rearrangements attributed to each signature 
in each breast cancer yielded seven major subgroups exhibiting distinct 
associations with other genomic, histological or gene expression fea- 
tures (Fig. 5, Extended Data Figs 4-6). 

Rearrangement signature 1 (9% of all rearrangements) and rear- 
rangement signature 3 (18% rearrangements) were characterized 
predominantly by tandem duplications (Fig. 4b). Tandem duplica- 
tions associated with rearrangement signature 1 were mostly >100 kb 
(Fig. 4b), and those with rearrangement signature 3 were <10 kb 
(Fig. 4b, Extended Data Fig. 7). More than 95% of rearrangement 
signature 3 tandem duplications were concentrated in 15% of 
cancers (cluster D, Fig. 5), many with several hundred rearrangements of 
this type. Almost all cancers (91%) with BRCA1 mutations or promoter 
hypermethylation were in this group, which was enriched for basal- 
like, triple negative cancers and copy number classification of a high 
homologous recombination deficiency (HRD) index***°. Thus, inac- 
tivation of BRCA1, but not BRCA2, may be responsible for the rear- 
rangement signature 3 small tandem duplication mutator phenotype. 

More than 35% of rearrangement signature 1 tandem duplications 
were found in just 8.5% of the breast cancers and some cases had 
hundreds of these (cluster F, Fig. 5). The cause of this large tandem 
duplication mutator phenotype (Fig. 4b) is unknown. Cancers exhib- 
iting it are frequently TP53-mutated, relatively late diagnosis, triple- 
negative breast cancers, showing enrichment for base substitution 
signature 3 and a high HRD index (Fig. 5), but do not have BRCA 1/2 
mutations or BRCA1 promoter hypermethylation. 

Rearrangement signature 1 and 3 tandem duplications (Extended 
Data Fig. 7) were generally evenly distributed over the genome. However, 
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Figure 4 | Additional characteristics of base substitution signatures and 
novel rearrangement signatures in 560 breast cancers. a, Contrasting 
transcriptional strand asymmetry and replication strand asymmetry 
between twelve base substitution signatures. b, Six rearrangement 


there were nine locations at which recurrence of tandem duplications 
was found across the breast cancers and which often showed multiple, 
nested tandem duplications in individual cases (Extended Data Fig. 8). 
These may be mutational hotspots specific for these tandem duplication 
mutational processes, although we cannot exclude the possibility that 
they represent driver events. 

Rearrangement signature 5 (accounting for 14% rearrangements) 
was characterized by deletions <100 kb. It was strongly associated 
with the presence of BRCA1 mutations or promoter hypermethyla- 
tion (cluster D, Fig. 5), BRCA2 mutations (cluster G, Fig. 5) and with 
rearrangement signature 1 large tandem duplications (cluster F, Fig. 5). 

Rearrangement signature 2 (accounting for 22% rearrangements) 
was characterized by non-clustered deletions (>100 kb), inversions 
and interchromosomal translocations, was present in most cancers but 
was particularly enriched in oestrogen receptor (ER)-positive cancers 
with quiet copy number profiles (cluster E, GISTIC (genomic identifi- 
cation of significant targets in cancer) cluster 3; Fig. 5). Rearrangement 
signature 4 (accounting for 18% of rearrangements) was characterized 
by clustered interchromosomal translocations, whereas rearrangement 
signature 6 (19% of rearrangements) had clustered inversions and 
deletions (clusters A, B, C; Fig. 5). 

Short segments (1-5 bp) of overlapping microhomology character- 
istic of alternative methods of end-joining repair were found at most 
rearrangements”'*, Rearrangement signatures 2, 4 and 6 were charac- 
terized by a peak at 1 bp of microhomology, whereas rearrangement 
signatures 1, 3 and 5, associated with homologous recombination 
DNA repair deficiency, exhibited a peak at 2 bp (Extended Data Fig. 9). 
Thus, different end-joining mechanisms may operate with different 
rearrangement processes. A proportion of breast cancers showed rear- 
rangement signature 5 deletions with longer (>10bp) microhomologies 
involving sequences from short-interspersed nuclear elements, most 
commonly AluS (63%) and AluY (15%) family repeats (Extended Data 
Fig. 9). Long segments (more than 10 bp) of non-templated sequence 
were particularly enriched amongst clustered rearrangements. 


Localized hypermutation: kataegis 

Focal base-substitution hypermutation, termed kataegis, is generally 
characterized by substitutions with characteristic features of signatures 2 
and 13 (refs 2, 24). Kataegis was observed in 49% breast cancers, with 
4% exhibiting 10 or more foci (Supplementary Table 21c). Kataegis colo- 
calizes with clustered rearrangements characteristic of rearrangement 


Rearrangement size 
signatures extracted using non-negative matrix factorization. Probability 


of rearrangement element on y axis. Rearrangement size on x axis. 
del, deletion; tds, tandem duplication; inv, inversion; trans, translocation. 


signatures 4 and 6 (Fig. 4b). Cancers with tandem duplications or deletions 
of rearrangement signatures 1, 3 and 5 did not usually demonstrate 
kataegis. However, there must be additional determinants of kataegis as 
only 2% of rearrangements are associated with it. A rare (14 out of 1,557 
foci, 0.9%) alternative form of kataegis, colocalizing with rearrange- 
ments but with a base-substitution pattern characterized by T>G and 
T>C mutations, predominantly at NIT and NTA sequences (where 
N can be any base A, T, C or G), was also observed (Extended Data 
Fig. 10). This pattern of base substitutions most closely matches signature 
9 (Extended Data Fig. 10; http://cancer.sanger.ac.uk/cosmic/signatures), 
previously observed in B lymphocyte neoplasms and attributed to 
polymerase eta activity”. 


Mutational signatures exhibit distinct DNA replication 
strand biases 

The distributions of mutations attributable to each of the 20 muta- 
tional signatures (12 base substitution, 2 indel and 6 rearrangement) 
were explored with respect to DNA replication strand. We found an 
asymmetric distribution of mutations between leading and lagging 
replication strands for many, but not all signatures” (Fig. 4a). Notably, 
signatures 2 and 13, owing to APOBEC deamination, showed marked 
lagging-strand replication bias (Fig. 4a) suggesting that lagging-strand 
replication provides single-stranded DNA for APOBEC deamination. 
Of the three signatures associated with mismatch-repair deficiency 
(signatures 6, 20 and 26), only signature 26 exhibited replicative-strand 
bias, highlighting how different signatures arising from defects of the 
same pathway can exhibit distinct relationships with replication. 


Mutational signatures associated with BRCA1 and 
BRCA2 mutations 

Of the 560 breast cancers, 90 had germline (60) or somatic (14) inac- 
tivating mutations in BRCA1 (35) or BRCA2 (39) or showed methyla- 
tion of the BRCA1 promoter (16). Loss of the wild-type chromosome 
17 or 13 was observed in 80 out of 90 cases. The latter exhibited 
many base substitution mutations of signature 3, accompanied by 
deletions of >3 bp with microhomology at rearrangement break- 
points, and signature 8 together with CC>AA double nucleotide 
substitutions. Cases in which the wild-type chromosome 17 or 13 
was retained did not show these signatures. Thus signature 3 and, 
to a lesser extent, signature 8 are associated with absence of BRCA1 
and BRCA2 functions. 
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Figure 5 | Integrative analysis of rearrangement 
signatures. Heatmap of rearrangement 


signatures following unsupervised hierarchical 
clustering based on proportions of 


rearrangement signatures in each cancer. Seven 
cluster groups (A-G) noted and relationships 
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Cancers with inactivating BRCA1 or BRCA2 mutations usually carry 
many genomic rearrangements. Cancers with BRCA1, but not BRCA2, 
mutations exhibit large numbers of rearrangement signature 3 small 
tandem duplications. Cancers with BRCA1 or BRCA2 mutations show 
substantial numbers of rearrangement signature 5 deletions. No other 
rearrangement signatures were associated with BRCA1- or BRCA2-null 
cases (clusters D and G, Fig. 5). Some breast cancers without identifiable 

BRCA1/2 mutations or BRCA1 promoter methylation showed these 
features and segregated with BRCA1/2-null cancers in hierarchical 
clustering analysis (Fig. 5). In such cases, the BRCA1/2 mutations may 
have been missed or other mutated or promoter methylated genes may be 
exerting similar effects (see http://cancer.sanger.ac.uk/cosmic/sample/ 
genomes for examples of whole-genome profiles of typical BRCA1- 
null, (for example, PD6413a, PD7215a) and BRCA2-null tumours (for 
example, PD4952a, PD4955a)). 

A further subset of cancers (cluster F, Fig. 5) show similarities in 
mutational pattern to BRCA 1/2-null cancers, with many rearrangement 
signature 5 deletions and enrichment for base substitution signatures 3 
and 8. However, these do not segregate together with BRCA1/2-null 
cases in hierarchical clustering analysis, have rearrangement signature 1 
large tandem duplications and do not show BRCA1/2 mutations. 
Somatic and germline mutations in genes associated with the DNA 
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double-strand break repair pathway including ATM, ATR, PALB2, 
RADSIC, RAD50, TP53, CHEK2 and BRIP1, were sought in these can- 
cers. We did not observe any clear-cut relationships between mutations 
in these genes and these mutational patterns. 

Cancers with BRCA1/2 mutations are particularly responsive to cispla- 
tin and PARP inhibitors**-*°. Combinations of base substitution, indel 
and rearrangement mutational signatures may be better biomarkers 
of defective homologous-recombination-based DNA double-strand 
break repair and responsiveness to these drugs*® than BRCA 1/2 muta- 
tions or promoter methylation alone and thus may constitute the basis 
of future diagnostics. 


Conclusions 

A comprehensive perspective on the somatic genetics of breast can- 
cer is drawing closer (see http://cancer.sanger.ac.uk/cosmic/sample/ 
genomes for individual patient genome profile, and Methods for 
orientation). At least 12 base substitution mutational signatures and 6 
rearrangement signatures contribute to the somatic mutations found, 
and 93 mutated cancer genes (31 dominant, 60 recessive, 2 uncertain) 
are implicated in genesis of the disease. However, dominantly 
acting activated fusion genes and non-coding driver mutations appear 
rare. Additional infrequently mutated cancer genes probably exist. 
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However, the genes harbouring the substantial majority of driver 
mutations are now known. 

Nevertheless, important questions remain to be addressed. Recurrent 
mutational events including whole-chromosome copy number changes 
and unexplained regions with recurrent rearrangements could harbour 
additional cancer genes. Identifying non-coding drivers is challenging 
and requires further investigation. Although almost all breast cancers 
have at least one identifiable driver mutation, the number with only 
a single identified driver is perhaps surprising. The roles of viruses 
or other microbes have not been exhaustively examined. Thus, fur- 
ther exploration and analysis of whole-genome sequences from breast 
cancer patients will be required to complete our understanding of the 
somatic mutational basis of the disease. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
The experiments were not randomized and the investigators were not blinded to 
allocation during experiments and outcome assessment. 

Sample selection. DNA was extracted from 560 breast cancers and normal tissue 
(peripheral blood lymphocytes, adjacent normal breast tissue or skin) and total 
RNA extracted from 268 of the same individuals. Samples were subjected to pathol- 
ogy review and only samples assessed as being composed of >70% tumour cells, 
were accepted for inclusion in the study (Supplementary Table 1). 

Massively parallel sequencing and alignment. Short insert 500 bp genomic librar- 
ies and 350 bp poly-A-selected transcriptomic libraries were constructed, flowcells 
prepared and sequencing clusters generated according to Illumina library proto- 
cols‘”. We performed 108 base/100 base (genomic), or 75 base (transcriptomic) 
paired-end sequencing on Illumina GAIIx, Hiseq 2000 or Hiseq 2500 genome 
analysers, in accordance with the Illumina Genome Analyzer operating manual. 
The average sequence coverage was 40.4-fold for tumour samples and 30.2-fold 
for normal samples (Supplementary Table 2). 

Short insert paired-end reads were aligned to the reference human genome 
(GRCh37) using Burrows-Wheeler Aligner, BWA (v0.5.9)*8, RNA sequencing data 
was aligned to the human reference genome (GRCh37) using TopHat (v1.3.3) 
(http://ccb.jhu.edu/software/tophat/index.shtml). 

Processing of genomic data. CaVEMan (Cancer Variants Through Expectation 
Maximization: http://cancerit.github.io/CaVEMan/) was used for calling somatic 
substitutions. 

Indels in the tumour and normal genomes were called using a modified Pindel 
version 2.0. (http://cancerit.github.io/cgpPindel/) on the NCBI37 genome build”. 

Structural variants were discovered using a bespoke algorithm, BRASS 
(BReakpoint AnalySiS) (https://github.com/cancerit/BRASS) through discor- 
dantly mapping paired-end reads. Next, discordantly mapping read pairs that 
were likely to span breakpoints, as well as a selection of nearby properly paired 
reads, were grouped for each region of interest. Using the Velvet de novo assem- 
bler°°, reads were locally assembled within each of these regions to produce a 
contiguous consensus sequence of each region. Rearrangements, represented by 
reads from the rearranged derivative as well as the corresponding non-rearranged 
allele were instantly recognizable from a particular pattern of five vertices in the 
de Bruijn graph (a mathematical method used in de novo assembly of (short) 
read sequences) component of Velvet. Exact coordinates and features of junc- 
tion sequence (for example, microhomology or non-templated sequence) were 
derived from this, following aligning to the reference genome, as though they 
were split reads. 

See Supplementary Table 3 for summary of somatic variants. Annotation was 
according to ENSEMBL version 58. 

Single nucleotide polymorphism (SNP) array hybridization using the 
Affymetrix SNP6.0 platform was performed according to Affymetrix protocols. 
Allele-specific copy number analysis of tumours was performed using ASCAT 
(v2.1.1), to generate integral allele-specific copy number profiles for the tumour 
cells*! (Supplementary Tables 4 and 5). ASCAT was also applied to next-generation 
sequencing data directly with highly comparable results. 

We sampled 12.5% of the breast cancers for validation of substitutions, indels 
and/or rearrangements in order to make an assessment of the positive predictive 
value of mutation calling (Supplementary Table 6). 

Further details of these processing steps as well as processing of transcriptomic 

and miRNA data (Supplementary Tables 7 and 8) can be found in Supplementary 
Methods. 
Identification of novel breast cancer genes. To identify recurrently mutated 
driver genes, a dN/dS method that considers the mutation spectrum, the sequence 
of each gene, the impact of coding substitutions (synonymous, missense, non- 
sense, splice site) and the variation of the mutation rate across genes*”*? was 
used for substitutions (Supplementary Table 9). Owing to the lack of a neutral 
reference for the indel rate in coding sequences, a different approach was required 
(Supplementary Table 10, Supplementary Methods for details). To detect genes 
under significant selective pressure by either point mutations or indels, for each 
gene, the P values from the dN/dS analysis of substitutions and from the recur- 
rence analysis of indels were combined using Fisher’s method. Multiple testing 
correction (Benjamini-Hochberg FDR) was performed separately for the more 
than 600 putative driver genes and for all other genes, stratifying the FDR cor- 
rection to increase sensitivity (as described in ref. 54). To achieve a low false 
discovery rate, a conservative q-value cutoff of <0.01 was used to determine 
statistical significance (Supplementary Table 11). 

This analysis was applied to the new whole-genome sequences of 560 breast 
cancers as well as a further 772 breast cancers that have been sequenced previously 
by other institutions. 

See Supplementary Methods for detailed explanations of these methods. 
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Recurrence in the non-coding regions. Partitioning the genome into functional 
regulatory elements/gene features. To identify non-coding regions with significant 
recurrence, we used a method similar to the one described for searching for novel 
indel drivers (see Supplementary Methods for detailed description). 

The genome was partitioned according to different sets of regulatory elements/ 
gene features, with a separate analysis performed for each set of elements, includ- 
ing exons (n= 20,245 genes), core promoters (n= 20,245 genes, where a core 
promoter is the interval (—250,+250) bp from any transcription start site (TSS) 
of a coding transcript of the gene, excluding any overlap with coding regions), 
5’ UTR (n=9,576 genes), 3 UTR (n= 19,502 genes), intronic regions flanking exons 
(n= 20,212 genes, represents any intronic sequence within 75 bp from an exon, 
excluding any base overlapping with any of the above elements), any other sequence 
within genes (n = 18,591 genes, for every protein-coding gene, this contains any 
region within the start and end of transcripts not included in any of the above 
categories), non-coding RNAs (ncRNAs) (n= 10,684, full length lincRNAs, miRNAs 
or rRNAs), enhancers*’ (n = 194,054), ultra-conserved regions (n= 187,057, a 
collection of regions under negative selection based on 1,000 genomes data””). 

Every element set listed above was analysed separately to allow for different 
mutation rates across element types and to stratify the FDR correction™. Within 
each set of elements, we used a negative binomial regression approach to learn 
the underlying variation of the mutation rate across elements. The offset reflects 
the expected number of mutations in each element assuming uniform mutation 
rates across them (that is, Esubs,element =x j€{l,2,...,192} (r;S;t), and, Eindels,element = [indel 
Sindelelement) (see Supplementary Methods 7 for a detailed description and defini- 
tion of all parameters). As covariate here we used the local density of mutations in 
neighbouring non-coding regions, corrected for sequence composition and trinu- 
cleotide mutation rates (that is, the t parameter of the dN/dS equations described 
in section 7.1 of Supplementary Methods). Normalized local rates were pre- 
calculated for 100 kb non-overlapping bins of the genome and used in all analyses. 
Other covariates (expression, replication time or Hi-C (genome-wide chromosome 
conformation capture)) were not used here as they were not found to substantially 
improve the model once the local mutation rate was used as a covariate. A separate 
regression analysis was performed for substitutions and indels, to account for the 
different level of uncertainty in the distribution of substitution and indel rates 
across elements. 


modelgubs = glm.nb(formula = nyu; ~ offset (log(Esubs)) + Higcal,subs) 


modelindels = glm.nb(formula = nindels ~ offset(log(Eindels)) + Hiocal,indels) 


The observed counts for each element (“gubs,element 22 Nindelselement) are Compared 
to the background distributions using a negative binomial test, with the estimated 
overdispersion parameters (subs and Gindels) estimated by the negative binomial 
regression, yielding P values for substitution and indel recurrence for each element. 
These P values were combined using Fisher’s method and corrected for multiple 
testing using FDR (Supplementary Table 16a). 

Partitioning the genome into discrete bins. We performed a genome-wide screening 
of recurrence in 1 kb non-overlapping bins. We employed the method described in 
earlier section, using as covariate the local mutation rate calculated from 5 Mb up 
and downstream from the bin of interest and excluding any low-coverage region 
from the estimate (Supplementary Table 16b, Extended Data Fig. 3a for example). 
Significant hits were subjected to manual curation to remove false positives caused 
by sequencing or mapping artefacts. 

Mutational signatures analysis. Mutational signatures analysis was performed 
following a three-step process: (i) hierarchical de novo extraction based on somatic 
substitutions and their immediate sequence context, (ii) updating the set of con- 
sensus signatures using the mutational signatures extracted from breast cancer 
genomes, and (iii) evaluating the contributions of each of the updated consensus 
signatures in each of the breast cancer samples. These three steps are discussed in 
more details in the next sections. 

Hierarchical de novo extraction of mutational signatures. The mutational catalogues 
of the 560 breast cancer whole genome sequences were analysed for mutational 
signatures using a hierarchical version of the Wellcome Trust Sanger Institute muta- 
tional signatures framework”. Briefly, we converted all mutation data into a matrix, 
M, that is made up of 96 features comprising mutations counts for each mutation 
type (C>A, C>G, C>T, T>A, T>C, and T>G; all substitutions are referred to 
by the pyrimidine of the mutated Watson—Crick base pair) using each possible 
5! (C, A, G, and T) and 3’ (C, A, G, and T) context for all samples. After conver- 
sion, the previously developed algorithm was applied in a hierarchical manner 
to the matrix M that contains K mutation types and G samples. The algorithm 
deciphers the minimal set of mutational signatures that optimally explains the 
proportion of each mutation type and then estimates the contribution of each 
signature across the samples. More specifically, the algorithm makes use of a 
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well-known blind source separation technique, termed non-negative matrix factor- 
ization (NNMF). NNMF identifies the matrix of mutational signature, P, and the 
matrix of the exposures of these signatures, E, by minimizing a Frobenius norm, 
while maintaining non-negativity: 


_min ||M — PE|f. 
K,N N, 

Pem™) bem’ ® 

K is the number of mutation types (that is, 96), and K is the number of mutation 


types after dimensionality reduction. P € Mee N) is a matrix of real non-negative 


(N, G) 


numbers of dimension K x N. E ¢ Mp); ”’ is a matrix of real non-negative num- 


bers of dimension N x G. The method for deciphering mutational signatures, 
including evaluation with simulated data and list of limitations, can be found in 
ref. 25. The framework was applied in a hierarchical manner to increase its ability 
to find mutational signatures present in few samples as well as mutational signatures 
exhibiting a low mutational burden. More specifically, after application to the orig- 
inal matrix M containing 560 samples, we evaluated the accuracy of explaining the 
mutational patterns of each of the 560 breast cancers with the extracted mutational 
signatures. All samples that were well-explained by the extracted mutational sig- 
natures were removed and the framework was applied to the remaining sub-matrix 
of M. This procedure was repeated until the extraction process did not reveal any 
new mutational signatures. Overall, the approach extracted 12 unique mutational 
signatures operative across the 560 breast cancers (Fig. 3, Supplementary Table 21). 
Updating the set of consensus mutational signatures. The 12 hierarchically extracted 
breast cancer signatures were compared to the census of consensus mutational 
signatures”*. Of the 12 signatures, 11 closely resembled previously identified muta- 
tional patterns. The patterns of these 11 signatures, weighted by the numbers of 
mutations contributed by each signature in the breast cancer data, were used to 
update the set of consensus mutational signatures as previously performed in 
ref. 25. One of the 12 extracted signatures is novel and at present, unique for breast 
cancer. This novel signature is consensus signature 30 (http://cancer.sanger.ac.uk/ 
cosmic/signatures). 

Evaluating the contributions of consensus mutational signatures in 560 breast cancers. 
The complete compendium of consensus mutational signatures that was found 
in breast cancer includes: signatures 1, 2, 3, 5, 6, 8, 13, 17, 18, 20, 26, and 30. We 
evaluated the presence of all of these signatures in the 560 breast cancer genomes 
by re-introducing them into each sample. More specifically, the updated set of 
consensus mutational signatures was used to minimize the constrained linear 
function for each sample: 


N 
i 2 
min)||m — Cj 
minim — > (Pe) 


Here, m is a vector with 96 components corresponding to the counts of each of the 
mutation types in a sample, p; represents a vector with 96 components (correspond- 
ing to a consensus mutational signature i), e; is a non-negative scalar reflecting the 
number of mutations contributed by signature i in that sample. N is equal to 12 
and it reflects the number of all possible signatures that can be found in a single 
breast cancer sample. Mutational signatures that did not contribute large numbers 
(or proportions) of mutations or that did not significantly improve the correlation 
between the original mutational pattern of the sample and the one generated by 
the mutational signatures were excluded from the sample. This procedure reduced 
over-fitting the data and allowed only the essential mutational signatures to be 
present in each sample (Supplementary Table 21b). 

Kataegis. Kataegis, or foci of localized hypermutation, has been previously 
defined”® as 6 or more consecutive mutations with an average intermutation 
distance of less than or equal to 1,000 bp. Kataegis were sought in 560 whole- 
genome sequenced breast cancers from high-quality base substitution data using 
the method described previously’. This method likely misses some foci of kataegis 
sacrificing sensitivity of detection for a higher positive predictive value of kataegic 
foci (Supplementary Table 21c). 

Rearrangement signatures. Clustered vs non-clustered rearrangements. We sought 
to separate rearrangements that occurred as focal catastrophic events or focal driver 
amplicons from genome-wide rearrangement mutagenesis using a piecewise con- 
stant fitting method. For each sample, both breakpoints of each rearrangement were 
considered individually and all breakpoints were ordered by chromosomal position. 
The inter-rearrangement distance, defined as the number of base pairs from one rear- 
rangement breakpoint to the one immediately preceding it in the reference genome, 
was calculated. Putative regions of clustered rearrangements were identified as having 
an average inter-rearrangement distance that was at least 10 times greater than the 
whole-genome average for the individual sample. Piecewise constant fitting parame- 
ters used were y=25 and kin = 10, with yas the parameter that controls smoothness 
of segmentation, and kin the minimum number of breakpoints in a segment. 


The respective partner breakpoint of all breakpoints involved in a clustered 
region are likely to have arisen at the same mechanistic instant and so were con- 
sidered as being involved in the cluster even if located at a distant chromosomal 
site. The rearrangements within clusters (‘clustered’) and not within clusters (‘non- 
clustered’) are summarized in Extended Data Table 4. 

Classification: types and size. In both classes of rearrangements, clustered and 
non-clustered, rearrangements were subclassified into deletions, inversions and 
tandem duplications, and then further subclassified according to size of the rear- 
ranged segment (1-10kb, 10-100 kb, 100 kb-1 Mb, 1-10 Mb, more than 10 Mb). 
The final category in both groups was interchromosomal translocations. 
Rearrangement signatures by NNMF. The classification produces a matrix of 32 dis- 
tinct categories of structural variants across 544 breast cancer genomes. This matrix 
was decomposed using the previously developed approach for deciphering muta- 
tional signatures by searching for the optimal number of mutational signatures that 
best explains the data without over-fitting the data’® (Supplementary Table 214, e). 
Consensus clustering of rearrangement signatures. To identify subgroups of samples 
sharing similar combinations of six identified rearrangement signatures derived 
from whole genome sequencing analysis we performed consensus clustering using 
the ConsensusClusterPlus R package”®. Input data for each sample (n = 544, a 
subset of the full sample cohort) was the proportion of rearrangements assigned 
to each of the six signatures. Thus, each sample has 6 data values, with a total sum 
of 1. Proportions for each signature were mean-centred across samples before 
clustering. The following settings were used in the consensus clustering: number 
of repetitions = 1000; pItem = 0.9 (resampling frequency samples); pFeature = 0.9 
(resampling frequency); Pearson distance metric; Ward linkage method. 
Distribution of mutational signatures relative to genomic architecture. 
Following extraction of mutational signatures and quantification of the exposures 
(or contributions) of each signature to each sample, a probability for each mutation 
belonging to each mutation signature (for a given class of mutation for example, 
substitutions) was assigned”. 

The distribution of mutations as signatures were assessed across multiple 
genomic features including replication time, strands, transcriptional strands and 
nucleosome occupancy. See ref. 42 for technical details, per signature results. 
Individual patient whole-genome profiles. Breast cancer whole-genome profiles 
were adapted from the R Circos package*”. See http://cancer.sanger.ac.uk/cosmic/ 
sample/genomes for individual patient genome profiles. Features depicted in circos 
plots from outermost rings heading inwards: Karyotypic ideogram outermost. Base 
substitutions next, plotted as rainfall plots (log;9 intermutation distance on radial 
axis, dot colours: blue, C>A; black, C>G; red, C>T; grey, T>A; green, T>C; 
pink, T>G). Ring with short green lines, insertions; ring with short red lines, dele- 
tions. Major copy number allele (green, gain) ring, minor copy number allele ring 
(pink, loss), Central lines represent rearrangements (green, tandem duplications; 
pink, deletions; blue, inversions; grey, interchromosomal events). In each profile, 
the top right-hand panel displays the number of mutations contributing to each 
mutation signature extracted using NNMF in individual cancers. Middle right- 
hand panel represents indels. Bottom right corner shows histogram of rearrange- 
ments present in this cancer. Bottom left corner shows all curated driver mutations, 
top- and middle-left panels show clinical and pathology data respectively. 
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Extended Data Figure 1 | Landscape of driver mutations. a, Summary of 
subtypes of cohort of 560 breast cancers. b, Driver mutations by mutation 
type. c, Distribution of rearrangements throughout the genome. Black 
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line represents background rearrangement density (calculation based on 
rearrangement breakpoints in intergenic regions only). Red lines represent 
frequency of rearrangement within breast cancer genes. 
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Extended Data Figure 2 | Rearrangements in oncogenes. a, Variation 

in rearrangement and copy number events affecting ESR1. Clear 
amplification in top panel, transection of ESR1 in middle panel and 
focused tandem duplication events in bottom panel. b, Predicted outcomes 
of some rearrangements affecting ETV6. Red crosses indicate exons 
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deleted as a result of rearrangements within the ETV6 genes, black dotted 
lines indicate rearrangement break points resulting in fusions between 
ETV6 and ERC, WNK1, ATP2B1 or LRP6. ET V6 domains indicated are: 
N-terminal (NT) pointed domain and E26 transformation-specific DNA 
binding domain (ETS). 


© 2016 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


30 = | | | Promoter | | | | 
PIK3CA GATA3 —_ PLEKHS1 TP53 
25 — | . oe ° 
= 20- 
a | 
= 157 : | 
= | intron GATA3 
Dn = PIK3CA GPR126 i «¢ | 
Ke) 10 intergenic Srsei, = 6 | intergenic | xpP 1 
| ° ° | e | | 3 : 
54 ay - a era a : |e, * 
it a a if se 
1 2 3 4 5 6 vA 8 9 10 11 12 13 14 15 161718192022 X Y 
chromosome 
B 
IncRNA NEAT1 IncRNA MALAT1 
sub sub 
indel indel 
T T T 
65,190,000 65,200,000 65,210,000 . 65,266,000 65,270,000 65,274,000 
ie chr11 coordinate (Mbp) gh ay chr11 coordinate (Mbp) ye 
* 7 tre a 
ec ~ 
Cr a Pe 7 
a 2 iy oa 
N eer Po 
Scale 5@ kb. { he19 
chrits i} 65,198, 600] 65,206, 690] 65,216, 60) 65,226, 600] 65,230, 606] 65,240, 960] 65,250, 900] 65,260, 606] 65,276, 060 65,280, 000] 65,296, 800| 
Chromosome Bands eer eats FISH Mapping Clones 
FRMOS (tiem ieee UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAS & Comparative Genomics) sealerta Winbaene 
FRHOS ff TT TT 1 | 1 [sssexantncninn pesieees es 
FRMDS | kro 
RefSeq Genes 
RefSeq Genes j_«— ——S —— 
Publications: Sequences in Scientific Articles 
Sequences|| || HAH i} HHH HAH 
Hunan mRNAs from GenBank 
Human mRNAS | | i ee |) | iil) en t§ Ra Wh a IH WE 
166 _ HSK27AC Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE 
-avered HoKe7AC 
eo. —— 
Transcription Factor ChiP-seq (161 factors) from ENCODE with Factorbook Motifs 
Tan Factor CniPiil) @/0 mm! gs el ee ee eee 1 im 
4.86 _ 160 vertekrates Basewise Conservation by Phy lor 
168 Vert, Beal le \ 1 
elie Multiz Alignments of 196 Vertebrates 
Mouse fe EH | EN Ea [ieee ee Tit SEO —x—_—— 
Etepnant Pe ee eee | TR teem TCI ae cine Co rc 
Chicken: 
tropicalis | Ti! = 
Geprse ish —_——— 
Comprey 
| | | Tr ae | Lal | 
©) BWA BAM PD67286 (coveraze demh) LL | 
debi dull tel LL “a 
© BWA BAM PD861 1a (coverage depth) (Wea ae 
dadmaka eco ostae [nar ae 
pee eee ee Pal Oe 
datadnbcedal atta: i ae 
i 


Extended Data Figure 3 | Recurrent non-coding events in breast 
cancers. a, Manhattan plot demonstrating sites with most significant 
P values as identified by binning analysis. Purple highlighted sites were 
also detected by the method seeking recurrence when partitioned by 
genomic features. b, Locus at chr11 65 Mb, which was identified by 
independent analyses as being more mutated than expected by chance. 


Bottom, a rearrangement hotspot analysis identified this region as a 
tandem duplication hotspot, with nested tandem duplications noted at 
this site. Partitioning the genome into different regulatory elements, an 
analysis of substitutions and indels identified ncRNAs MALAT1 and 
NEATI1 (topmost panels) with significant P values. 
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Extended Data Figure 6 | Rearrangement cluster groups and associated features. a, Overall survival (OS) by rearrangement cluster group. b, Age of 
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diagnosis. c, Tumour grade. d, Menopausal status. e, ER status. f, Immune response metagene panel. g, Lymphocytic infiltration score. 
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Extended Data Figure 8 | Hotspots of tandem duplications. A tandem duplication hotspot occurring in six different patients. 
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Extended Data Figure 9 | Rearrangement breakpoint junctions. a, Breakpoint features of rearrangements in 560 breast cancers by rearrangement 
signature. b, Breakpoint features in BRCA and non-BRCA cancers. 
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Extended Data Figure 10 | Signatures of focal hypermutation. 

a, Kataegis and alternative kataegis occurring at the same locus (ERBB2 
amplicon in PD13164a). Copy number (y axis) depicted as black dots. 
Lines represent rearrangements breakpoints (green, tandem duplications; 
pink, deletions; blue, inversions). Top, an ~10 Mb region including 

the ERBB2 locus. Middle, zoomed-in tenfold to an ~1 Mb window 


highlighting co-occurrence of rearrangement breakpoints, with copy 
number changes and three different kataegis loci. Bottom, demonstrates 
kataegis loci in more detail. logio intermutation distance on y axis. Black 
arrow, kataegis; blue arrows, alternative kataegis. b, Sequence context of 
kataegis and alternative kataegis identified in this data set. 
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Proteogenomics connects somatic 
mutations to signalling in breast cancer 


Philipp Mertins!*, D. R. Mani", Kelly V. Ruggles**, Michael A. Gillette!**, Karl R. Clauser!, Pei Wang*, Xianlong Wang?, 

Jana W. Qiao!, Song Cao®, Francesca Petralia*, Emily Kawaler?, Filip Mundt’, Karsten Krug!, Zhidong Tu’, Jonathan T. Lei’, 
Michael L. Gatza?, Matthew Wilkerson®, Charles M. Perou’, Venkata Yellapantula®, Kuan-lin Huang®, Chenwei Lin, 

Michael D. McLellan®, Ping Yan°, Sherri R. Davies!®, R. Reid Townsend”, Steven J. Skates", Jing Wang’’, Bing Zhang”, 
Christopher R. Kinsinger’’, Mehdi Mesri!?, Henry Rodriguez!*, Li Ding®, Amanda G. Paulovich’, David Feny6?, Matthew J. Ellis®, 
Steven A. Carr! & the NCI CPTAC+ 


Somatic mutations have been extensively characterized in breast cancer, but the effects of these genetic alterations on 
the proteomic landscape remain poorly understood. Here we describe quantitative mass-spectrometry-based proteomic 
and phosphoproteomic analyses of 105 genomically annotated breast cancers, of which 77 provided high-quality data. 
Integrated analyses provided insights into the somatic cancer genome including the consequences of chromosomal 
loss, such as the 5q deletion characteristic of basal-like breast cancer. Interrogation of the 5q trans-effects against the 
Library of Integrated Network-based Cellular Signatures, connected loss of CETN3 and SKPI to elevated expression 
of epidermal growth factor receptor (EGFR), and SKP1 loss also to increased SRC tyrosine kinase. Global proteomic 
data confirmed a stromal-enriched group of proteins in addition to basal and luminal clusters, and pathway analysis 
of the phosphoproteome identified a G-protein-coupled receptor cluster that was not readily identified at the mRNA 
level. In addition to ERBB2, other amplicon-associated highly phosphorylated kinases were identified, including CDK12, 
PAK], PTK2, RIPK2 and TLK2. We demonstrate that proteogenomic analysis of breast cancer elucidates the functional 
consequences of somatic mutations, narrows candidate nominations for driver genes within large deletions and amplified 


regions, and identifies therapeutic targets. 


A central deficiency in our knowledge of cancer concerns how 
genomic changes drive the proteome and phosphoproteome to exe- 
cute phenotypic characteristics!*. The initial proteomic characteri- 
zation in the The Cancer Genome Atlas (TCGA) breast cancer study 
was performed using reverse phase protein arrays (RPPA); however 
this approach is restricted by antibody availability. To provide greater 
analytical breadth, the NCI Clinical Proteomic Tumor Analysis 
Consortium (CPTAC) is using mass spectrometry to analyse the 
proteomes of genome-annotated TCGA tumour samples®®. Here we 
describe integrated proteogenomic analyses of TCGA breast cancer 
samples representing the four principal mRNA-defined breast cancer 


intrinsic subtypes”. 


Proteogenomic analysis of TCGA samples 

105 breast tumours previously characterized by the TCGA were 
selected for proteomic analysis after histopathological documentation 
(Supplementary Tables 1 and 2). The cohort included a balanced rep- 
resentation of PAM50-defined intrinsic subtypes? including 25 basal- 
like, 29 luminal A, 33 luminal B, and 18 HER2 (ERBB2)-enriched 
tumours, along with 3 normal breast tissue samples. Samples were 


analysed by high-resolution accurate-mass tandem mass spectrom- 
etry (MS/MS) that included extensive peptide fractionation and 


phosphopeptide enrichment (Extended Data Fig. 1a). An isobaric pep- 
tide labelling approach (iTRAQ) was employed to quantify protein and 
phosphosite levels across samples, with 37 iTRAQ 4-plexes analysed 
in total. A total of 15,369 proteins (12,405 genes) and 62,679 phos- 
phosites were confidently identified with 11,632 proteins per tumour 
and 26,310 phosphosites per tumour on average (Supplementary 
Tables 3, 4 and Supplementary Methods). After filtering for observation 
in at least a quarter of the samples (Supplementary Methods, Extended 
Data Fig. 1b), 12,553 proteins (10,062 genes) and 33,239 phosphosites, 
with their relative abundances quantified across tumours, were used in 
subsequent analyses in this study. Stable longitudinal performance and 
low technical noise were demonstrated by repeated interspersed analyses 
ofa single batch of patient-derived luminal and basal breast cancer xen- 
ograft samples!° (Extended Data Fig. 1d, e). Owing to the heterogene- 
ous nature of breast tumours!!~!>, and because proteomic analyses were 
performed on tumour fragments that were different from those used in 
the genomic analyses, rigorous pre-specified sample and data quality 
control metrics were implemented'*!° (Supplementary Discussion and 
Extended Data Figs 2, 3). Extensive analyses concluded that 28 of the 105 
samples were compromised by protein degradation. These samples were 
excluded from further analysis with subsequent informatics focused on 
the 77 tumour samples and three biological replicates. 
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Figure 1 | Direct effects of genomic alterations on protein level. 

a, b, Overlap of protein-coding single amino acid variants (a) and RNA 
splice junctions (b) not present in RefSeq v60 detected by DNA exome 
sequencing, RNA-seq, and LC-MS/MS. Proportions of novel variants 
are noted. c, Heat map of mutations/CNA and their effects on RNA and 
protein expression of breast-cancer-relevant genes across tumour and 
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Genome and transcriptomic variation was observed at the peptide 
level by searching MS/MS spectra not matched to RefSeq against a 
patient-specific sequence database (Fig. 1a). The database was con- 
structed using the QUILTS software package’®, leveraging RefSeq gene 
models based on whole-exome and RNA-seq data generated from 
portions of the same tumours and matched germline DNA (Fig. 1a, 
Supplementary Table 5). Although these analyses detected a number 
of single amino acid variants, frameshifts, and splice junctions, includ- 
ing splice isoforms that had been detected as only single transcript 
reads by RNA-seq (Fig. 1b, Supplementary Table 5), the number of 
genomic and transcriptomic variants that were confirmed as peptides 
by MS/MS was low (Supplementary Discussion). Sparse detection of 
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normal samples. ER, PR, HER2 and PAMS0 status are annotated. Median 
iTRAQ protein abundance ratio and the most frequently detected and 
differential phosphosite ratio are shown for each gene. Pearson correlations 
between MS/MS protein and RNA-seq, and MS/MS protein and RPPA are 
indicated. 


individual genomic variants by peptide sequencing has been noted in 
our previous studies!® and reflects limited coverage at the single amino 
acid level with current technology. However, quantitative MS/MS anal- 
ysis of multiple peptides for each protein is used to reliably infer over- 
all protein levels. This is an advantage of MS/MS, as antibody-based 
protein expression analysis is typically based on a single epitope. To 
illustrate this capability in the current data set, an initial analysis of 
three frequently mutated genes in breast cancer (TP53, PIK3CA, 
and GATA3) and three clinical biomarkers (oestrogen receptor (ER; 
ESR1), progesterone receptor (PGR), and ERBB2) was conducted 
(Fig. 1c, Supplementary Table 6, 7 and Supplementary Discussion). 
As expected, T'P53 missense mutations were associated with elevated 
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Figure 2 | Effects of CNAs on mRNA, protein, and phosphoprotein 
abundance. a, Correlations of CNA (x axes) to RNA and protein 
expression levels (y axes) highlight new CNA cis- and trans-effects. 
Significant (FDR < 0.05) positive (red) and negative (green) correlations 
between CNA and mRNAs or proteins are indicated. CNA cis-effects 
appear as a red diagonal line, CNA trans-effects as vertical stripes. 
Histograms show the fraction (%) of significant CNA trans-effects for 


MS/MS-based protein levels, as observed by RPPA, especially in basal- 
like breast cancer. TP53 nonsense and frameshift mutations were asso- 
ciated with a decrease in TP53 protein levels that was particularly 
pronounced in the MS/MS data. In contrast, the mostly C-terminal 
GATA3 frameshift alterations did not result in decreased protein 
expression when measured by the median of all GATA3 peptides, sug- 
gesting that these proteins are expressed despite truncation. No con- 
sistent effect of somatic PIK3CA mutation was observed at the level of 
protein expression. Good Pearson correlations between RNA-seq and 
MS/MS protein-expression levels were found for ESRI (r=0.74), PGR 
(r=0.74), ERBB2 (r=0.84) and GATA3 (r =0.83), with moderate cor- 
relations observed for PIK3CA (r=0.45) and TP53 (r=0.36). Lower 
TP53 protein abundance levels compared to mRNA levels were espe- 
cially prevalent in luminal tumours, suggesting post-transcriptional 
regulatory mechanisms such as proteasomal degradation. To explore 
this hypothesis, a search was made for E3 ligases that showed negative 
correlation to p53 protein (Supplementary Table 8). These analyses 
identified UBE3A (r= —0.42; adjusted P value = 0.05) (Extended Data 
Fig. 4a), an established TP53 E3 ligase’”. In comparing copy number 
alterations (CNAs), RNA, and protein levels for GATA3, copy number 
gains in chromosome 10q were anticorrelated with RNA and protein 
levels in basal-like tumours. This observation prompted a search for 
other gains or losses that were anticorrelated with RNA and/or protein 
levels (see Extended Data Fig. 4b for further analyses). Overall, six 
genes were identified that significantly anticorrelated at a false discov- 
ery rate (FDR) <0.05 on both RNA and protein levels to their CNA 
signals (Extended Data Fig. 4b). GATA3 amplification on 10q in basal- 
like breast cancer showed the strongest anticorrelation, followed by the 
hexosamine and glycolysis pathway enzymes GFPT2 and HK3, which 
are upregulated in basal-like breast cancer despite being subjected to 
frequent chromosomal deletion on 5q. Global analysis of the correla- 
tion of mRNA-to-protein yielded a median Pearson value of r=0.39, 
with 6,135 out of 9,302 mRNA-protein pairs (66.0%) correlating sig- 
nificantly at an FDR <0.05 (Extended Data Fig. 4c, Supplementary 
Table 9 and Supplementary Discussion). Similar to a previous colon 
cancer analysis®, metabolic functions such as amino acid, sugar and 
fatty acid metabolism were found to be enriched among positively 
correlated genes!® whereas ribosomal, RNA polymerase and mRNA 
splicing functions were negatively correlated. Overall these analyses 
demonstrate the utility of global proteome correlation analysis for both 
confirmation of suspected regulatory mechanisms and identification 
of candidate regulators meriting further investigation. 


Protein b_ Genes with CNA cis-effects 
Sm ESE 
‘ 
i —| Phosphoprotein 
Z RNA 
8,451 
2s (64% of 
& Protein \ 13,266) 
i 2.761 
a % 
a 
3 ~ 
| 
a i C Number of trans-effects 
a a: 
& § 
ge egkr salen 
BeeagSse8a88 
i TTT oO FO oOUDTUCUC AOU 
B88 8A SFR ASG 
ona Nr Pee es 
eal al log ball LY ih e ESS&EN EES 
1 2 3 4 5 6 7 8 910111213 15 1719 22 Y 655 xX ®eo Cage: 
() Qe ~ Ss 2 0 
aa 


each CNA gene. b, Overlap of cis-effects observed at RNA, protein, and 
phosphoprotein levels (FDR < 0.05). ¢, Trans-effect regulatory candidates 
identified among those with significant protein cis-effects using LINCS 
CMap. Bars indicate total numbers of significant CNA-protein trans- 
effects (grey; FDR < 0.05) and overlap with regulated genes in LINCS 
knockdown profiles (red; 4 cell lines; moderated t-test FDR < 0.1). 


Copy number alterations 

To determine the consequences of CNAs on mRNA, protein, and 
phosphoprotein abundance, both in ‘cis’ on genes within the aberrant 
locus and in ‘trans’ on genes encoded elsewhere, univariate correlation 
analysis was used as previously described*. A total of 7,776 genes with 
CNA, mRNA and protein measurements were analysed by calculating 
Pearson correlation and associated statistical significance (Benjamini- 
Hochberg-corrected P value) for all possible CNA-mRNA and CNA- 
protein pairs (Fig. 2a, Supplementary Table 10, Extended Data Fig. 5a, 
see Methods). For the phosphoproteome, 4,472 CNA-phosphoprotein 
pairs were analysed (Extended Data Fig. 5b). Significant positive cor- 
relations (cis) were observed for 64% of all CNA-mRNA, 31% of all 
CNA-protein, and 20% of all CNA-phosphoprotein pairs Fig. 2b. 
Proteins and phosphoproteins correlated in cis to CNAs were, for the 
most part, a subset of the cis-effects observed in mRNA-CNA corre- 
lation (Fig. 2b, Supplementary Table 10). The fractional difference of 
well-annotated oncogenes and tumour suppressor genes among the 
significantly cis-correlated CNA-mRNA and CNA-protein gene pairs 
was analysed. On the basis of a reference list of 487 oncogenes and 
tumour suppressors (Supplementary Table 10), these cancer-relevant 
genes occur 37.6% more frequently in the subset of genes that corre- 
late both on CNA~mRNA and CNA-protein levels than in the subset 
that only correlate on CNA-~mRNA but not on CNA-protein levels 
(Fisher exact P value = 0.02). This suggests that CNA events with a 
tumour-promoting outcome more likely lead to cis-regulatory effects 
on both the protein and mRNA level, whereas CNA events with no 
documented role in tumorigenesis are more likely to be neutralized on 
the protein level than on the RNA level. Trans-effects (Fig. 2a) appear 
as vertical bands, with accompanying frequency histograms (in blue) 
highlighting ‘hot spots’ of significant trans-effects. Using a minimum 
threshold of 50 trans-affected genes, 68% of the tested genes were asso- 
ciated with trans-effects on the mRNA level, whereas only 13% were 
associated with effects on the protein level and 8% on the phosphop- 
rotein level. Importantly, CNA-protein correlations appeared to be a 
reduced representation of CNA-mRNA correlations. Furthermore, 
for many CNA regions, correlations were more directionally uniform 
on the protein level than on the mRNA level. CNA regions exhibiting 
the most trans-associations at the protein level were found on chromo- 
somes 5q (loss of heterozygosity (LOH) in basal; gain in luminal B), 
10p (gain in basal), 12 (gain in basal), 16q (luminal A deletion), 
17q (luminal B amplification), and 22q (LOH in luminal and basal) 
(Extended Data Fig. 5a). 
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Figure 3 | Proteomic and phosphoproteomic subtypes of breast 

cancer and subtype-specific pathway enrichment. a, Unsupervised 
clustering of RNA-seq and proteomics data restricted to PAM50 genes and 
subset of 35 detected proteins reveal high similarity to PAM50 (TCGA) 
sample annotation. b, K-means consensus clustering of proteome and 
phosphoproteome data identifies basal-enriched, luminal-enriched, 


Trans-associations are not necessarily direct consequences of 
the chromosomal aberration. For example, as 5q loss occurs in at 
least 50% of basal-like breast cancers!’, many of the trans-effects 
involve genes that mark the basal subtype. To identify candidate 
driver genes with copy number alterations that are direct drivers of 
trans-effects, results were compared with functional knockdown data 
on 3,797 genes in the Library of Integrated Network-based Cellular 
Signatures (LINCS) database (http://www.lincsproject.org/)”°~*. For 
any given gene with copy number alterations (“CNA-gene), sets of 
genes were identified corresponding to proteins that changed where 
there was gain (‘CNA-gain trans-gene set’) or loss ((CNA-loss trans- 
gene set’). These gene sets were then compared to the effects of gene 
knockdown in the LINCS database (see Supplementary Methods). 
Queries for 502 different CNA genes meeting the criteria defined 
above identified 10 CNA genes that could be functionally connected 
to both CNA-gain and CNA-loss trans-protein-level effects (Extended 
Data Fig. 5c, Supplementary Table 11). A permutation-based 
approach implemented to test significance (see Supplementary 
Methods) yielded an FDR <0.05 for 10 genes affected by both CNA 
gains and losses (Fig. 2c). These proteins were defined as potential 
regulatory candidates for the CNA trans-effects observed on the pro- 
teome level in this study, as in a gene-dependent manner an average 
of 17% of these trans-effects were consistent with the knockdown 
profiles. Notably, the established oncogenic receptor tyrosine kinase 
ERBB2 was functionally connected only to CNA gain trans-effects 
(Supplementary Table 11). The E3 ligase SKP1 (ref. 23) and the rib- 
onucleoprotein export factor CETN3, both located on chromosome 
arm 5q with frequent losses in basal-like breast cancer and less fre- 
quent gains in luminal B breast cancer, were detected as potential 
regulators affecting the expression of the tyrosine kinase and ther- 
apeutic target EGFR, and SKP1 also was linked to SRC (Extended 
Data Fig. 5d). Another potential regulator, FBXO7 (a substrate 
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recognition component of the SCF (SKP1-CUL1-F-box protein)-type 
E3 ubiquitin ligase complex), was affected mostly by LOH events 
on chromosome 22g. Interestingly, in a recent human interaction 
proteome study, SKP1 and FBXO7 were listed as interaction 


partners”*, 


Clustering and network analyses 

Transcriptional profiling has converged on four major breast can- 
cer subtypes: luminal A, luminal B, basal and HER2-enriched!”. To 
investigate the extent to which the PAM50 ‘intrinsic’ breast cancer 
classification scheme is reflected or refined on the proteome level 
in the CPTAC samples, clustering analyses were first restricted to 
the reduced set of PAM50 genes. When RNA data for the 50 PAM50 
genes were clustered directly (without using a classifier), the clus- 
tering was similar to the TCGA PAM50 annotation (second anno- 
tation bar in Fig. 3a). Restricting both the RNA and proteome data 
to the set of 35 PAMS50 genes observed in the proteome produced a 
similar result (bottom two annotation bars in Fig. 3a), and all of the 
major PAMS50 groups were recapitulated in the proteome almost as 
well as in the RNA data. This indicates that although different tissue 
sections of the same tumours were used for RNA-seq and protein 
analysis, very similar subtype-defining features can be observed in 
both data types. Global proteome and phosphoproteome data were 
then used to identify proteome subtypes in an unsupervised manner. 
Consensus clustering identified basal-enriched, luminal-enriched, 
and stromal-enriched clusters (Extended Data Figs 6a-d, 7a). Unlike 
the clustering observed with PAM50 genes, mRNA-defined HER2- 
enriched tumours were distributed across these three proteomic sub- 
groups. The basal-enriched and luminal-enriched groups showed a 
strong overlap with the mRNA-based PAMS50 basal-like and luminal 
subgroups, whereas stromal-enriched proteome subtype represented 
a mix of all PAM50 mRNA-based subtypes, and has a significantly 
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enriched stromal signature (Extended Data Fig. 3e). Among the 
stromal-enriched tumours there was strong representation of reac- 
tive type I tumours, as classified by RPPA (Supplementary Table 12), 
showing agreement between the RPPA and mass-spectrometry-based 
protein analyses for the detection of a tumour subgroup characterized 
by stromal gene expression’. 

As the basal- and luminal-enriched proteome subgroups are 
coherent, pathway analyses were conducted on these two subtypes, 
using the stromal-enriched subgroup as a control to assess spec- 
ificity (Fig. 3c, Extended Data Fig. 7b, Supplementary Table 13). 
The luminal-enriched subgroup was exclusively enriched for 
oestradiol- and ESR1-driven gene sets. In contrast, multiple gene sets 
were enriched and upregulated specifically in the basal-like tumours. 
Particularly extensive basal-like enrichment was seen for MYC target 
genes; for cell cycle, checkpoint, and DNA repair pathways including 
regulators AURKA/B, ATM, ATR, CHEK1/2, and BRCA1/2; and for 
immune response/inflammation, including T-cell, B-cell, and neutro- 
phil signatures. The complementarity of transcriptional, proteomic, 
and phosphoproteomic data was also highlighted in these analyses 
(Extended Data Fig. 7c, d). 

Using phosphorylation status as a proxy for activity, phosphop- 
roteome profiling can theoretically be used to develop a signalling- 
pathway-based cancer classification. K-means consensus clustering 
was therefore performed on pathways derived from single sample gene 
set enrichment analysis (GSEA) of phosphopeptide data (Methods, 
Supplementary Tables 14 and 15). Of four robustly segregated groups, 
subgroups 2 and 3 substantially recapitulated the stromal- and 
luminal-enriched proteomic subgroups, respectively (Fig. 3d, 
Extended Data Fig. 8a). Subgroup 4 included a majority of tumours 
from the basal-enriched proteomic subgroup, but was admixed par- 
ticularly with luminal-enriched samples. This subgroup was defined 
by high levels of cell cycle and checkpoint activity. All basal and a 
majority of non-basal samples in this subgroup had TP53 mutations. 
Consistent with high levels of cell cycle activity, a multivariate kinase- 
phosphosite abundance regression analysis highlighted CDK1 as one 
of the most highly connected kinases in this study (Extended Data 
Fig. 8b, Supplementary Table 16). Subgroup 1 was a novel subgroup 
defined exclusively in the phosphoproteome pathway activity domain, 
with no enrichment for either proteomic or PAM50 subtypes. It was 
defined by G protein, G-protein-coupled receptor, and inositol phos- 
phate metabolism signatures, as well as ionotropic glutamate signal- 
ling (Fig. 3d). Co-expression patterns among genes/proteins across 
different subgroups were also analysed using a Joint Random Forest 
method” that identified network modules, such as an MMP9 mod- 
ule, with different interaction patterns between basal-enriched and 
luminal-enriched subgroups. These latter patterns appeared specific 
to the proteome-level data (Extended Data Fig. 8c-f, Supplementary 
Table 17 and Supplementary Methods). 


Phosphosite markers in PIK3CA- and TP53- mutated 
tumours 

TP53 and PIK3CA are the most recurrently mutated genes in breast 
cancer, with frequencies for PIK3CA at 43% in luminal tumours and 
for TP53 at 84% in basal-like tumours!. Most of the PIK3CA missense 
mutations were gain of function mutations and therefore were expected 
to lead to activation of the PI3K signalling cascade, but the extent to 
which this occurs has been controversial and it is unclear which pathway 
components are effectors”®?”. Marker selection analysis was therefore 
performed for upregulated phosphosites in PIK3CA-mutated tumours. 
In total, 62 phosphosites were identified that were positively associated 
with PIK3CA mutation (FDR <0.05), including the kinases RPS6KA5 
and EIF2AK4 (Extended Data Fig. 9a, Supplementary Table 18). 
Calculating the average phosphorylation signal of these marker phos- 
phosites provided a read-out for PI3K pathway activity in PIK3CA- 
mutated tumours, with 15 of the 26 mutated tumours (58%) exhibiting 
an activated PIK3CA mutation signature. Of note, the identified 
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PIK3CA mutant phosphoproteome signature was activated in all 
tumours harbouring helical domain PIK3CA mutations, but only 2 of 
10 tumours harbouring kinase domain mutations. To test if the identi- 
fied differences in the phosphoproteome of PI3K mutant versus wild- 
type tumours could be explained by mutation of PIK3CA, the tumour 
data were compared to phosphosite signatures derived from isogenic 
PIK3CA mutant cell lines*® (Extended Data Fig. 9b, Supplementary 
Table 18). There was an enrichment of signatures derived from helical- 
domain-mutated isogenic cell lines, but not from kinase-domain- 
mutated cells, supporting the observations in primary tumours. 

The same strategy was used to identify phosphorylation signal- 
ling events connected to TP53 mutation. A total of 56 phosphosites 
upregulated in TP53-mutated tumours were identified that were 
independent of basal-like subtype association (Extended Data Fig. 9c, 
Supplementary Table 18). Using the average phosphorylation signal 
of these marker phosphosites as a proxy for TP53-mutation-driven 
cell cycle control, 22 of 41 mutated tumours (54%) showed upreg- 
ulated signals. This T7P53 mutant phosphosignature was somewhat 
enhanced in tumours in which mutations occurred almost exclu- 
sively in the DNA-binding region compared to those with nonsense/ 
frameshift mutations. In addition to the well-described checkpoint 
kinase CHEK2, significantly upregulated phosphosites were identi- 
fied for the kinases MASTL and EEF2K in TP53-mutated tumours. 
Single-sample GSEA analysis of isogenic p53-mutant phosphosig- 
natures showed an enrichment of a phosphosignature derived from 
R273H-mutated isogenic cells (Extended Data Fig. 9d), confirming the 
pronounced effect of missense mutations in the DNA-binding region 
on phosphorylation pathways. 


Kinase gene amplification and subtype-specific 
activation 

CNAs span many driver gene candidates and RNA expression has been 
frequently used to narrow candidate nominations. Proteogenomic 
analysis should further promote this nomination process. In can- 
didate refinement, a focus on protein kinases is warranted, as many 
are drug targets. An in-depth proteogenomic pipeline was developed 
that flagged kinases, expression levels of which were at least 1.5 inter- 
quartile ranges higher than the median (Supplementary Table 19). 
A proteogenomic circos-like”® plot (termed a ‘pircos’ plot) was used to 
map these outlier values onto the genome (Fig. 4a, b, Extended Data 
Fig. 10a). The ERBB2 locus showed the strongest effect of increased 
phosphoprotein levels associated with gene-amplification-driven RNA 
and protein over expression (Fig. 4a). The kinase CDK12 is a positive 
transcriptional regulator of homologous recombination repair genes 
with its partner cyclin K*°, and is often encompassed by the ERBB2 
amplicon. This gene was also found to be upregulated at the RNA, 
protein, and phosphosite level indicating that CDK12 is highly active 
in the majority of ERBB2-positive tumours (Fig. 4a). The analysis of 
the ERBB2 amplicon also uncovered co-outlier phosphorylation status 
for MED1, GRB7, MSL1, CASC3 and TOP2A, all previously described 
in association with ERBB2 amplification. To better understand the 
downstream effects of ERBB2 amplification, additional phosphosite 
outliers were identified in 41 known ERBB2 signalling genes for the 
15 samples that had ERBB2 phosphosite outlier expression (Extended 
Data Fig. 10b). 

These canonical findings stimulated a proteogenomic analysis to 
identify additional outlier kinases in the breast cancer genome. A pro- 
teogenomic dissection of chromosome 11q based on PAK1 ampli- 
fication (Fig. 4b, c), a breast cancer driver kinase?!, illustrated that 
PAK 1 is hyperphosphorylated in PAK1-amplified tumours, along with 
CLNS1A, RFS1 and GAB2 (ref. 32) Additional examples of outlier 
kinases included PTK2 and RIPK2 in association with amplification of 
chromosome 8q (Fig. 4c, Extended Data Fig. 10a, c). PAK1 and TLK2 
(17q23) appear to be luminal-breast-cancer-specific events (Fig. 4c, 
Extended Data Fig. 10c). To further examine whether outlier kinases 
were breast cancer subtype-specific independent of amplification 
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Figure 4 | Example analyses of aberrantly regulated kinases in human 
breast cancer. a, b, pircos (proteogenomics circos) plots showing CNA, 
RNA, protein, and phosphosite expression for 17 tumours with 
amplification in 17q (ERBB2 CNA >1) and 8 tumours with amplification in 
11q (PAK1 CNA > 1). Labelled genes have CNA >1 and phosphosite >1. 
c, Proteogenomic outlier expression analysis for ERBB2, CDK12, and 


status, the Benjamini-Hochberg-corrected probability was calculated 
of finding the number of phosphosite outliers within a subtype, given 
the total number of outliers across all subtypes, the subtype sample size 
and the total sample size (Fig. 4d). These analyses led to the expected 
identification of ERBB2 in the HER2-enriched subtype at the 5% 
FDR level, as well as the new finding of CDC42BPG (MRGK7‘), an 
effector kinase for RHO-family GTPases**. In basal-like breast can- 
cer, two kinases, PRKDC and SPEG, were significant at the 5% FDR 
level. PRKDC is a non-homologous end-joining factor that can be 
phosphorylated by ATM kinase, and is therefore a logical finding in 
this disease subset**. However SPEG, a kinase associated with severe 
dilated cardiomyopathy when suppressed”, has not been previously 
reported in association with breast cancer. A larger number of 
subtype-specific kinases were detected at the 10% FDR level, several 
of which have recently described relevance in breast cancer, including 
PRKD3 in basal-like breast cancer*, the LKB-regulated SIK3 in lumi- 
nal A breast cancer?” and CDK13 in luminal B breast cancer, which, 
similar to CDK12, can interact with cyclin Kk? 


Discussion 

The breadth and depth of proteomic and phosphoproteomic anal- 
yses displayed in this study demonstrates the strength of mass- 
spectrometry-based proteomics, but also some of the limitations 
inherent in proteolytic peptide sequencing (see Supplementary 
Discussion). An example of how high-dimensional proteomic analysis 
provides insight into unresolved genomic issues concerns the study 
of loss of the long arm of chromosome 5 (5q). Analysis of RNA and 


60 | NATURE | VOL 534 | 2 JUNE 2016 


PAK1. Samples with outlier phosphosite (red), protein (yellow), RNA 
(green) and copy number (purple) expression are shown. Phosphosite 
squares indicate per-sample outlier phosphosites. d, Outlier kinase events 
by PAMS50 subtype (>35% of subtype samples contain a phosphosite 
outlier; <10% FDR using Benjamini-Hochberg-adjusted P values). 


protein correlations narrowed the list of potential trans-deregulated 
proteins. Orthogonal candidate screening using functional genomics 
methodologies identified loss of CETN3 and SKP1 as potential trans- 
regulators, with upregulation of EGFR as a downstream consequence 
in basal-like breast cancers. Although further experimental evi- 
dence must be sought for these proposed regulatory relationships, 
the SKP1—Cullin complex has already been linked to EGFR activation in 
glioma**. Unfortunately, EGFR targeting has not proven to be effective 
therapy in basal-like breast cancer to date**. This might be due to the 
fact the SKP1 loss deregulates multiple targets, therefore mandating a 
much broader inhibitory strategy. 

It is recognized that PIK3CA mutations do not strongly activate 
canonical downstream effectors**. Mass-spectrometry-based phos- 
phoproteomics provides an opportunity for unbiased examination of 
downstream signalling events dependent on PIK3CA mutational acti- 
vation. These studies revealed that common PIK3CA mutations affect 
a large number of targets with diverse functionalities including the 
kinases RPS6KA5 and EIF2AK4. Thus, the data and analyses reported 
here extend our knowledge of the effectors that promote tumorigenesis 
in response to constitutive activation of PI3 kinase. Similarly, TP53- 
mutation-associated phosphopeptides point towards novel function- 
alities, including regulation of the kinases MASTL and EEF2K. 

A central goal in breast cancer research has been the identification 
of druggable kinases beyond HER2. Candidate genes that exhibited 
similar gene-amplification-driven proteogenomic patterns to ERBB2 
included CDK12, TLK2, PAK1 and RIPK2. The proteogenomic link 
with gene amplification was particularly strong for CDK12, in keeping 
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with its location in the ERBB2 amplicon, whereas the strengths of 
correlation between DNA amplification, RNA, protein, and phosphop- 
rotein for the other examples were more variable. The presence of 
activated CDK12 in the ERBB2 amplicon might explain why tumours 
arising in BRCAI carriers are usually ERBB2-negative. As a positive 
transcriptional regulator of BRCA1 and multiple FANC family mem- 
bers, CDK12 promotes DNA repair by homologous recombination. 
CDK 12 amplification would, therefore, oppose the functional effects 
of BRCA1 haploinsufficiency during tumour evolution*’. Overall, 
multiple outlier kinases generate testable therapeutic hypotheses for 
which enabling inhibitors are in development. For example, PAK1 has 
recently been confirmed to be a therapeutic target and poor prognosis 
factor in luminal breast cancer*®. 

Although incomplete outcome data and the remarkable heteroge- 
neity of breast cancer are further relevant constraints, the number of 
TCGA specimens analysed here is insufficient to support conclusive 
clinical correlations. Only 8 deaths occurred among the 77 patients, 
which are too few to provide sufficient statistical power for associa- 
tion analysis. Adequately powered MS/MS-based clinical investigation 
will require microscaled discovery or targeted approaches", especially 
given the highly limited amount of patient material available from 
clinical trials and the mostly formalin-fixed nature of the specimens. 
The current analysis is therefore centred on biological findings and 
correlations, with orthogonal validation and false discovery concerns 
addressed through an examination of cell-line databases of the effects 
of individual gene perturbations. Typical of a multi-tiered analysis of 
this complexity, there are many hypotheses to test, and many findings 
that require further investigation. 

In conclusion, this study provides a high-quality proteomic resource 
for human breast cancer investigation, and illustrates technologies and 
analytical approaches that provide an important new opportunity to 
connect the genome to the proteome. Larger-scale exploration of dis- 
covery proteomics in the clinical setting will require improvements 
in clinical investigation, including acquisition of adequate amounts 
of optimally collected tumour tissue both before and during therapy 
as well as advances in MS/MS proteomics to reduce sample input and 
increase sensitivity for low abundance proteins and modified peptides. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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quantified in this study. Each iTRAQ MS/MS spectrum measures a peptide 
from four samples (3 individual patients and the reference sample mix 

of 40 patients). More than 400,000 distinct peptides were identified and 
quantified in ~14 million MS/MS spectra. Personalized tumour-specific 
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whole-exome-sequencing-derived variant calls and RNA-seq-derived 
transcript information. All mass spectrometry data was analysed using 
the Spectrum Mill software package. b, Overview of proteome and 
phosphoproteome data sets. The table provides a summary of the data 
sets used in specific analyses, including the filters applied to derive the 
proteins and phosphosites/phosphoproteins that constitute each data 
set; the protein, phosphosite or phosphoprotein count; and the methods 
that employ the respective data sets. c, Distribution of sequence coverage 
of the identified proteins with tryptic peptides detected by MS/MS, 
whiskers show the 5-95 percentiles. d, e, Robust and accurate proteome/ 
phosphoproteome platform. Longitudinal performance was tested by 
repeated proteome and phosphoproteome analysis of patient-derived 
xenograft tumours. Scatter plots, histograms and Pearson correlations 
comparing individual replicate measurements are shown. 
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Extended Data Figure 2 | Tumour sample quality control. a, Remark 
diagram showing sample processing and partitioning. Initial quality 
review encompassed histopathological examination of tissue slices stained 
with haematoxylin and eosin. *For 3 samples, no tumour cells were seen 
on histopathology (BH-AOE9, BH-A0C1, A2-A0SW). These samples were 
nevertheless included in the proteome analysis as other quality control 
standards were met (see below) and samples with 0% tumour cellularity 
on top or bottom sections were included in TCGA analyses. b, Correlation 
of TCGA (top or bottom sections) and CPTAC histological assessment 

of neoplastic cellularity for samples (n = 105). The average and range of 
neoplastic cellularities were identical for CPTAC and TCGA histological 
assessments. Averages (s.d.) for neoplastic cellularity were 76% (-£17) for 
CPTAC, 76% (+15) for TCGA_Top, and 75% (£18) for TCGA_Bottom 
histopathology slides (Supplementary Table 2). Note that in three 

CPTAC cases where no tumour cells were identified by histopathological 


CPTAC VAF 


assessment, numbers of protein-level somatic variants were similar to 

all other tumours. The identified mutated proteins were TP53_R273C, 
NOP58_Q23E, TAGLN2_G154R, TUBA1B_D116H, and MRPL48_1173K 
(Supplementary Table 5), indicating presence of tumour cells in these 
samples. c, Proteome iTRAQ tumour to internal reference ratio heat map 
for all CPTAC samples (8,028 proteins without missing values) including 
passed and failed proteomic quality control (QC) samples. d, Global 
tumour to reference proteome ratio distributions for samples that passed 
and failed proteomic quality control analysis. e, Degradation-related 

gene sets were enriched in tumours that failed proteomic quality control 
analysis. f, Variant allele frequency (VAF) analysis of re-sequenced CPTAC 
tumours and comparison to original TCGA data. Overall VAFs for failed 
quality control samples were lower compared to passed samples suggesting 
lower purity. 
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Extended Data Figure 3 | Tumour sample quality control. a, There was 
high concordance (94.6%) between DNA variants reported by TCGA and 
CPTAC re-sequenced tumours. Most point mutations reported by TCGA 
could be identified across the eight re-sequenced samples used in the study. 
b, A high overall correlation (mean = 0.77) was observed for the CPTAC 
VAF (x axis) and TCGA VAF (y axis) across the eight samples used in the 
study. c, Agglomerative hierarchical clustering (Supplementary Methods 
section 3.8) used to co-cluster protein and RNA tumour expression data 
after filtering to retain 4,291 proteins and genes with moderate to high 
protein-RNA correlation (Pearson correlation > 0.4) with results displayed 
as a circular dendrogram (fanplot). The proteome (.P) and RNA (.R) 
components of each sample are labelled using the same colour. The outer 
ring shows proteome samples in light grey and RNA samples in dark grey. 
High concordance between RNA and protein expression is evident from 
the colour adjacency in the inner ring and alternating colour in the outer 
ring showing that RNA and protein components co-cluster for a large 
proportion of samples (62 out of 80). d, Co-clustering of MS/MS and RPPA 
tumour data. 126 RPPA readouts were mapped to gene names. These 
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genes were intersected with the genes observed in the MS/MS proteome, 
filtered to 48 proteins with moderate or higher RPPA-MS/MS protein 
correlation, and analysed for co-clustering as in c. 47 of 80 RPPA-~MS/MS 
protein pairs co-cluster. Although this is a smaller proportion than for RNA- 
protein analysis, the number of genes used in the clustering is significantly 
smaller for RPPA (48 versus 4,291 for RNA). e, ESTIMATE tumour purity 
comparison between mRNA, RNA-seq, and proteome data. ANOVA is used 
to assess the difference in distribution (—logio(P value)) of ESTIMATE, 
stromal, immune, and tumour purity scores across mRNA (microarray), 
RNA-seq and proteome data. The only significant P value (0.02) is for the 
cluster 3 stromal score, and higher stromal scores for the proteome drive 
that difference. f, Ischaemia score analysis. Comparison of ischaemia scores 
of 77 CPTAC tumours, 3 normal samples, and patient-derived xenografts. 
CPTAC tumours had generally lower ischaemia scores than PDX samples 
subjected to 30 min of cold ischaemia. Median ischaemia scores are less than 
30 min for each subtype and no significant differences were observed across 
subtypes. Effects due to cold ischaemia therefore appear to be negligible in 
this CPTAC sample collection. 
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Extended Data Figure 4 | Protein-protein, protein-CNA, and 
protein-mRNA correlation analyses. a, Identification of UBE3A as an 
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E3 ubiquitin ligase that negatively correlates to p53 on the protein level. correlations. Depicted genes have significant negative correlations at 
Pearson correlation and Benjamini-Hochberg-corrected P value are FDR <0.05 in the CNA-RNA and CNA-protein analyses. c, Global 
shown. b, Analysis of counter-regulated genes with negative correlation of | mRNA-protein correlation and gene set enrichment analysis. 


© 2016 Macmillan Publishers Limited. All rights reserved 


CNA-RNA as well as CNA-protein levels. Negative Pearson correlations 
are shown with Benjamini-Hochberg-corrected P values for CNA-protein 
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Extended Data Figure 5 | Global CNA effects and comparison of CNA 
trans-effects to knockdown signatures in the LINCS database. a, CNA 
landscape in the CPTAC tumour collection. The segment-based CNAs 

of 77 samples were downloaded from TCGA Firehose, including 

18 Basal, 12 Her2, 23 Luminal A and 24 Luminal B subtypes. Copy number 
amplifications were marked in red and deletions in blue. The bottom 
colour key represents the log-transformed copy number value, with 
CNA =2 centred at 0. Specific CNA events are seen for chromosome 5q 
and 10p regions in basal-like tumours. b, Correlations of copy number 
alterations (x axis) to phosphoprotein levels (y axis) highlight new CNA 
cis- and trans-effects. Significant (FDR < 0.05) positive (red) and negative 
(green) correlations between CNA and phosphoproteins are indicated. 
Histograms show the fraction (%) of significant CNA trans-effects for 


each CNA gene. c, LINCS CMap analysis facilitates identification of novel 
functional candidates for CNA trans-effects. Knockdown profiles were 
compared with CNA-protein trans-effects for 502 genes. Genes with a 
connectivity score >|90| were considered connected and significant 
cis-effects were annotated at an FDR <0.05. d, Basal-like tumour-specific 
CNAs are candidate regulatory events for EGFR and SRC expression levels. 
Oncogenic kinases with significant CNA-protein trans-effects (left panel), 
that were regulated in LINCS short hairpin RNA experiments (right panel; 
4 cell lines) and directly measured as LINCS landmark genes, are shown 
alongside candidate regulatory genes CETN3 and SKP1. Clinical ER, PR, 
and HER2 annotation and PAM50 classification are shown in the header 
rows of each column. 
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Extended Data Figure 6 | Proteome cluster heat map and stability clustering for K = 3, 4, 5 and 6 target clusters. Consensus clustering 
analysis. a, K-means consensus clustering of proteome and was performed on 1,521 proteins with no missing values and s.d. > 1.5. 
phosphoproteome data identifies three subgroups: basal-enriched, c, Silhouette plots were generated to evaluate the coherence of the 
luminal-enriched, and stromal-enriched. The heat map represents all clustering. Silhouette plots for K=3 and K=4 clusters showing a cleaner 
1,521 proteins used for clustering (data set G8). b, Identification of separation of clusters for K = 3. d, On the basis of both visual inspection 
optimal proteome clusters for quality-control-passed CPTAC breast cancer _ of the consensus matrix and the delta plot assessing change in consensus 
tumours. Proteome clusters were derived using consensus clustering cumulative distribution function (CDF) area, three robustly segregated 
based on 1,000 resampled data sets, exploring the range of 2 to 6 K-means groups were observed. Consensus CDF and delta area (change in CDF 
clusters. Visualization of consensus matrices from K-means consensus area) plots for 2-6 clusters. 
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Extended Data Figure 7 | Proteome cluster markers and enriched 
pathways. a, Markers (based on SAM analysis; FDR <0.01) discriminate 
between proteome clusters 1, 2 and 3 (compare to heat map of proteins 
used to derive clusters depicted in Extended Data Fig. 6a). b, Applying 

a Fisher-exact-test-based enrichment analysis to the proteome, 
phosphoproteome and mRNA data, gene sets from MSigDB were 
identified that were unique for each proteome cluster. Heat map showing 
specific pathways comprising dominant biological themes that are 
significantly differential by enrichment analysis between basal-enriched 
and luminal-enriched tumours (Fisher exact test Benjamini-Hochberg- 
corrected P values are shown; enrichment test performed on marker 

sets identified using SAM analysis; see Methods; compare to Fig. 3c). 
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or luminal-enriched tumours exclusively by mRNA, protein or 
phosphoprotein expression. Cytokine signatures, for example, were 
strongly captured at the mRNA level, but were seen to only a limited 
degree at the global protein level, probably because of their typically low 
protein abundance. By contrast, the vast majority of significant gene sets 
annotated as ‘signaling’ were enriched only at the phosphoprotein level. 

d, Global heat map representing all gene sets significantly enriched 

in at least one of the proteomic breast cancer subtypes. The stromal- 
enriched group was characterized by breast cancer normal-like, adipocyte 
differentiation, smooth muscle, toll-like receptor signalling and endothelin 
gene sets, supporting the clustering-based annotation of high stromal 
and/or adipose content in these tumours (see Supplementary Table 13). 
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Extended Data Figure 8 | See next page for caption. 
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Extended Data Figure 8 | Phosphoproteome pathway clustering, 
kinase-phosphosite multivariate regression, and protein 

co-expression networks. a, Phosphoproteome pathway clustering. 

Using phosphorylation state as a proxy for activity, deep phosphoproteome 
profiling allows development of a breast cancer molecular taxonomy 

on the basis of signalling pathways. K-means consensus clustering was 
performed on pathways derived from single sample GSEA analysis of 
phosphopeptide data (908 pathways shown). Of four robustly segregated 
groups, subgroups 2 and 3 substantially recapitulated the stromal- and 
luminal-enriched proteomic subgroups, respectively. Subgroup 4 included 
a significant majority of tumours from the basal-enriched proteomic 
subgroup, but was admixed particularly with luminal-enriched samples. 
This subgroup was defined by high levels of cell cycle and checkpoint 
activity. All basal and a majority of non-basal samples in this subgroup 
had TP53 mutations. Subgroup 1 was a novel subgroup defined exclusively 
in the phosphoproteome pathway activity domain, with no enrichment 
for either proteomic or PAMS50 subtypes. It was defined by G protein, 
G-protein-coupled receptor, and inositol phosphate metabolism 
signatures, as well as ionotropic glutamate signalling. b, Analysis of the 
regulatory relationship between outlier kinases (see Supplementary 

Table 19) and phosphopeptides by regulatory multivariate regression 
analysis (see Methods) identified CDK] as the most highly connected 

of the outlier cyclin-dependent kinases, with highest centrality (based 

on node-degree; see Methods) among the outlier CDKs and seventh 
highest centrality among all the outlier kinases considered in the 

remMap analysis. Each line represents a phosphosite-kinase relationship. 
c-f, Analysis of differences in the co-expression patterns among genes/ 
proteins across different subgroups. A Joint Random Forest method 

was applied to simultaneously build gene co-expression and protein 
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co-expression networks (Supplementary Table 17, and Methods). Modules 
in these networks revealed different interaction patterns between basal- 
enriched and luminal-enriched subgroups. c, Network module P1 of the 
protein co-expression network, defined chiefly in the proteome space. 
This module contained 12 genes connected by 39 edges, among which 

34 were protein-specific and 5 were shared by both the protein and 
mRNA co-expression networks. Many edges were supported by published 
information and were contained in the STRING database. Edges in red are 
specific to the protein co-expression network; edges in green are shared by 
both protein and gene co-expression networks; edges indicated by double 
lines are contained in the STRING database with confidence score greater 
than 0.15. MMP%9, one of the central proteins in this module, contributes 
to metastatic progression and is a potential target for anti-metastatic 
therapies for basal-like/triple-negative breast cancer. d, Heat maps 

of the absolute correlation across each pair of genes in module P1 

(shown in c), based on either protein or gene expression data for samples 
in the basal-enriched and luminal-enriched subgroups, respectively. The 
MMP9 protein was strongly co-expressed with the other members of the 
module only in the basal-enriched subgroup. Notably, this observation 

is dependent on protein data; the correlation at the mRNA level for this 
module was consistently low in both the basal-enriched and luminal- 
enriched subgroups indicating that these events coherently occur at the 
proteomic level. e, Co-expression network based on proteomics data. 

The network contains 693 proteomic network-specific edges (grey) and 
792 edges shared with the RNA-seq network (green). For each module, 
the most enriched category and corresponding Benjamini-Hochberg- 
adjusted P value is reported. Pie charts adjacent to each module show 

the proportion of proteomics-specific edges (grey area) and edges shared 
between proteomics and RNA-seq data (green area). f, RNA-seq network. 
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Extended Data Figure 9 | Phosphoproteome signatures of PIK3CA- and. 
TP53-mutated tumours highlight activated key regulators and indicate 
frequency of activation. a, c, Phosphosites upregulated in mutated tumours 
(SAM FDR <0.05 across all tumours and independently also across luminal 
tumours; average phosphosite signal for all markers shown as bar graph). To 
avoid confounding by intrinsic subtype-specific distinctions, only markers 
that were significantly identified both in analyses covering all tumours and 
analyses restricted to luminal tumours were selected (FDR < 0.05). Colour 
bars in the margins indicate FDRs for grouped analysis of different mutation 
classes and indicate kinase substrates of known kinases in the respective 
pathways. Significantly regulated kinase phosphosites are annotated. The 
average phosphorylation signal of the marker phosphosites provides a read- 
out for PI3K and TP53 pathway activity in mutated tumours (histogram 
below heat map). A 95% prediction confidence interval (indicated by dashed 
lines) across the average signal in non-mutated tumours was chosen in order 
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to discriminate active from non-active tumours. The most strongly activated 
PIK3CA kinase domain mutant tumour differed from the other nine kinase 
domain mutant tumours, as it contained an amino acid side chain charge 
neutral H1047L instead of the more common positively charged H1047R 
mutation. Among the 62 phosphosites identified that were significantly 
upregulated in PIK3CA-mutated tumours, 13 phosphosites were found 

on phosphoproteins that are known substrates of well-annotated kinases 

in the PIK3CA pathway (a, right column). In the mutant TP53 analysis, 

a total 20 phosphosites were found on phosphoproteins that are known 
substrates of well annotated kinases in the p53 pathway (c, right column). 

b, d, Upregulated phosphosite sets were derived from isogenic PIK3CA 

and TP53 mutant versus wild-type cell-line pairs and tested for enrichment 
within mutant versus wild-type CPTAC tumours using single sample GSEA. 
Significantly enriched phosphosite sets are shown (P< 0.05). 
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Extended Data Figure 10 | Pircos plots, kinase outliers and outliers in 
the ERBB2 pathway. a, Pircos (proteogenomics circos) plots for 8q and 
17q showing median CNA, RNA, protein, and phosphosite expression 

for 20 tumours with amplification in 8q based on RIPK2 CNA >1; 

23 tumours with amplification in 8q based on PTK2 CNA >1; 15 tumours 
with amplification in 17q based on CDK12 CNA >1; and 10 tumours with 
amplification in 17q based on TLK2 CNA >1. Red indicates expression 
>1, blue <—1, and grey between —1 and 1. Genes with both copy number 
amplification (CNA >1) and increased phosphosite expression (p-site >1) 
are labelled. b, Phosphosite outliers in known ERBB2 signalling genes. 

To better understand the downstream effects of ERBB2 amplification, 


phosphosite outliers in known ERBB2 signalling genes (MSigDB’ pathway 
set, ‘KEGG_ERBB_SIGNALING PATHWAY’) were identified for the 

15 samples that had ERBB2 phosphosite outlier status. Forty-one genes 
were identified as having a phosphosite outlier in at least one of the ERBB2- 
amplified samples. PAK4 and ARAF phosphosite outlier status were found 
in seven of the 15 ERBB2 kinase outlier samples; GSK3B outliers were 
found in 6 samples; and EIF4EBP1, MAP2K2, ABL1 and AKT1 outlier 
status was found in 5 of the 15 samples. c, Proteogenomic outlier expression 
analysis for TLK2 and RIPK2. Samples with outlier phosphosite (red), 
protein (yellow), RNA (green) and copy number (purple) expression are 
shown. Phosphosite squares indicate per-sample outlier phosphosites. 
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Activation of NMDA receptors and the 
mechanism of inhibition by ifenprodil 


Nami Tajima!, Erkan Karakas!, Timothy Grant?, Noriko Simorowski!, Ruben Diaz-Avalos’, Nikolaus Grigorieff? & Hiro Furukawa! 


The physiology of N-methyl-p-aspartate (NMDA) receptors is fundamental to brain development and function. NMDA 
receptors are ionotropic glutamate receptors that function as heterotetramers composed mainly of GluN1 and GluN2 
subunits. Activation of NMDA receptors requires binding of neurotransmitter agonists to a ligand-binding domain 
(LBD) and structural rearrangement of an amino-terminal domain (ATD). Recent crystal structures of GluN1-GluN2B 
NMDA receptors bound to agonists and an allosteric inhibitor, ifenprodil, represent the allosterically inhibited state. 
However, how the ATD and LBD move to activate the NMDA receptor ion channel remains unclear. Here we applied 
X-ray crystallography, single-particle electron cryomicroscopy and electrophysiology to rat NMDA receptors to show 
that, in the absence of ifenprodil, the bi-lobed structure of GluN2 ATD adopts an open conformation accompanied by 
rearrangement of the GluN1-GluN2 ATD heterodimeric interface, altering subunit orientation in the ATD and LBD and 
forming an active receptor conformation that gates the ion channel. 


NMDA receptors are critically involved in brain development and 
function, including learning and memory formation. NMDA recep- 
tors belong to the family of ionotropic glutamate receptors, which 
are glutamate-gated ion channels comprised of three major families, 
a-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid (AMPA) 
(GluA1-4), kainate (GluK1-5), and NMDA receptors (GluN1, 
GluN2A-D, and GluN3A, B)!. NMDA receptors are obligatory het- 
erotetramers mainly composed of two copies each of the GluN1 and 
GluN2 subunits, which bind glycine and L-glutamate, respectively. 
Under physiological conditions, the opening of the NMDA receptor 
ion channel requires concurrent binding of glycine and L-glutamate*“, 
and relief of magnesium block at the ion channel pore by membrane 
depolarization®®. The resulting calcium flux’ triggers a cascade of 
signal transduction necessary for synaptic plasticity®. Dysfunctional 
NMDA receptors are implicated in various neurological diseases and 
disorders such as Alzheimer’s disease, depression, stroke, epilepsy and 
schizophrenia”. 

NMDA receptor subunits, like those of other ionotropic glutamate 
receptor family members, are composed of multiple domains including 
an ATD, LBD, transmembrane domain (TMD) and carboxy-terminal 
domain (CTD) (Extended Data Fig. 1). Binding of neurotransmitter 
agonists to the LBD produces a large conformational change involving 
closure of the bi-lobed structure that is required for ion channel gat- 
ing in all ionotropic glutamate receptors!°"'”, but a distinctive feature 
of NMDA receptors is that activity is also robustly regulated by the 
ATD¥. For example, the ATD controls the open probability and speed 
of deactivation!*", and binds allosteric modulator compounds to reg- 
ulate ion channel activity!®. In contrast to NMDA receptors, there is 
no apparent role for the ATDs of AMPA and kainate receptors'”~*° 
in regulating the ion channel activities, even though they are essen- 
tial for subunit assembly”!. The recent crystal structures of intact 
heterotetrameric GluN1-GluN2B NMDA receptors complexed with 
agonists and allosteric inhibitors, ifenprodil or Ro 25-6981, revealed 
that the ATD and LBD interact tightly via a large interface area, unlike 
GluA2 AMPA receptor and GluK2 kainate receptor whose ATDs and 
LBDs interact minimally!””°?3, implying that activation of NMDA 
receptors requires concerted conformational alterations in the ATD 


and LBD'*”*3, The structures of the intact GluN1-GluN2B NMDA 
receptors”*”? and of the isolated ATDs complexed to ifenprodil** or 
zinc” showed a closed conformation of the bi-lobed GluN2B ATD 
architecture””-*°, probably representing the ‘allosterically inhibited’ 
functional state. In the presence of agonists, NMDA receptors are 
known to reside in active states that can trigger ion channel opening, 
as well as desensitized states with a channel that is closed even in the 
presence of bound agonists”°. Despite accumulating structural infor- 
mation on intact NUDA receptors’”?? , as well as isolated ATDs?*?>?” 
and LBDs!!®9, there is a lack of structures representing the active 
state and the mechanism of activation has remained unclear. In this 
study, we present structures of the isolated ATD in the apo-state and of 
the intact receptor in the activated conformation, providing a detailed 
mechanistic picture of receptor activation. 


Opening of the GluN2B ATD and subunit rearrangement 
The only available structures for the heterodimeric NMDA receptor 
ATDs to date are those bound to allosteric inhibitors ifenprodil and 
Ro 25-6981, representing the allosterically inhibited state?”-*°. We rea- 
soned that by conducting structural studies without allosteric inhib- 
itors, we could capture the ATD conformation that can activate the 
NMDA receptor ion channel. Thus, we determined the crystal structure 
of GluN1-GluN2B ATDs in the absence of an allosteric inhibitor (apo- 
GluN1b-GluN2B ATD) at 2.9 A resolution (Extended Data Table 1). 
We crystallized the purified GluN1b-GluN2B ATD proteins complexed 
to a Fab fragment derived from mouse monoclonal IgG to improve the 
quality of the crystals (Extended Data Fig. 2). The crystallographic analysis 
shows heterodimeric GluN1-GluN2B ATDs that have a bi-lobed 
architecture composed of the regions previously called R1 and R2 in the 
structure of GluN1b-GluN2B ATD bound to the allosteric inhibitor 
ifenprodil”* (Fig. 1). There are a number of differences between the 
structures of the apo-GluN1b-GluN2B ATD and the ifenprodil-bound 
GluN1b-GluN2B ATD™*. The most apparent difference is the separa- 
tion of GluN1b R1 and GluN2B R2? in the apo-GluN1b-GluN2B ATD, 
owing to the ~20° rigid-body opening of the GluN2B ATD bi-lobed 
structure in the apo-GluN1b-GluN2B ATD compared to that in 
the ifenprodil-GluN1b-GluN2B ATD (Fig. 1d). This observation is 


1Cold Spring Harbor Laboratory, W. M. Keck Structural Biology Laboratory, Cold Spring Harbor, New York 11724, USA. 2Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, 


Virginia 20147, USA. 
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Figure 1 | Structures of GluN1b-GluN2B ATD heterodimers. 

a, b, Crystal structure of the GluN1b-GluN2B ATD heterodimer in the 
apo state solved at 2.9 A (a) in comparison with the ifenprodil-bound 
structure (PDB ID, 3QEL) (b). The R1 and R2 lobes are coloured magenta 
and light pink for GluN1b ATD, and cyan and yellow for GluN2B ATD. 
Ifenprodil is represented by green spheres. c, d, Superimposition of the 
R1 lobes of GluN1b (c) and GluN2B (d) in the apo and ifenprodil-bound 
(grey) forms illustrates the relative ‘opening’ between R1 and R2 lobes. 

e, Superimposing the GluN2B R1 lobes of apo and ifenprodil-bound 
forms reveals an ~15° rotation of GluN1b ATD relative to GluN2B ATD 
along the axis of rotation (black rod). The distance of the R2 lobes in the 
GluN1b-GluN2B heterodimers is measured between GluN1b Lys178 and 
GluN2B Asn184 (green spheres). CT, C terminal; NT, N terminal. 


consistent with previous work suggesting that GluN2B ATD has 
open-cleft and closed-cleft conformations in the absence and pres- 
ence of ifenprodil, respectively, on the basis of luminescence res- 
onance energy transfer studies*®. Another major difference is the 
rearrangement of the GluN1b and GluN2B subunits involving an 
~15° rotation relative to one another (Fig. le). This rearrange- 
ment brings the lower lobes (R2) of GluN1-GluN2B considerably 
closer together in the apo-GluN1b-GluN2B ATD compared with 
the ifenprodil-GluN1b-GluN2B ATD (Fig. le, Extended Data 
Fig. 2c). For example, the distance between the Ca atoms of 
GluN1b Lys178 and GluN2B Asn184 in apo-GluN1b-GluN2B 
ATD is 4.4A closer than in the ifenprodil-GluN1b-GluN2B ATD 
(Fig. le). 

As the subunit arrangement in the apo-GluN1b-GluN2B ATD in 
our crystal structure is different from that previously observed in the 
ifenprodil-GluN1b-GluN2B ATD”, we sought to validate its phys- 
iological relevance. Towards this end, we tested whether an inter- 
subunit disulfide bond can form at the subunit interface observed 
in the the apo-GluN1b-GluN2B ATD, but not in the ifenprodil- 
GluN1b-GluN2B ATD in the context of the intact GluN1-GluN2B 
NMDA receptor by mutating GluN1 and GluN2B residues that are 
proximal to each other. We expected a spontaneous disulfide bond 
to form between the mutated cysteines in the intact GluN1-GluN2B 
NMDA receptor if the subunit interface observed in the crystal struc- 
ture is physiological. We engineered cysteine residues at GluN1b 
Phe113 and GluN2B Alal107, and at GluN1b Gly331 and GluN2B 
Glu75, expressed and purified the mutant GluN1b-GluN2B NMDA 
receptor in the context of the intact ion channel, and conducted west- 
ern blot analysis under non-reducing conditions to detect band shifts 
(Extended Data Fig. 3a). In the two selected positions, the disulfide 
bonds are formed only when the cysteine mutant (Extended Data 
Fig. 3) of GluN1 and that of GluN2B are co-expressed and detected 
by an anti-GluN1 and anti-GluN2B western blot in the absence of 
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Figure 2 | Conformational trap identifies the apo-GluN1b-GluN2B 
ATD structure as the ‘active’ form. a, Location of engineered 
cysteines in the crystal structure of the apo-GluN1b-GluN2B ATD 
(GluN1-4b(Ala175Cys)/GluN2B(Gln180Cys) in green spheres and 
GluN1-4b(Lys178Cys)/GluN2B(Asn184Cys) in blue spheres). 
b, Application of 200 |1M M4M in the presence or absence of 100 1M 
agonists (glycine and glutamate (gly/glut)) potentiates the macroscopic 
current measured at the holding potential of —60 mV by two-electrode 
voltage clamp. No potentiation was observed when M4M was applied 
in the presence of ifenprodil (ifen.). Shown here are the representative 
recording profiles for the GluN1-4b(Ala175Cys)/GluN2B(Gln180Cys) 
pair. c, d, Fold of potentiation is presented as Iqrs/I, (J, current amplitude) 
as measured in b for bifunctional MTS with different linker lengths (c) and 
M4M applied in different functional states (d). Error bars represent +s.d. 
for data obtained from at least five different oocytes per experiment. 


B-mercaptoethanol. When the cysteine mutants of one subunit is 
co-expressed with the wild type of the other subunit, no disulfide 
bonds are formed, indicating that they are specifically formed by the 
engineered cysteines. Taken together, the above experiments show 
that the GluN1-GluN2B subunit arrangement observed in the apo- 
GluN1b-GluN2B ATD crystal structure exists in the context of the 
intact GluN1b-GluN2B NMDA receptor. 


Active conformation of the ATD 

To understand the functional state that the crystal structure of the apo- 
GluN1b-GluN2B ATD may represent, we next attempted to stabilize 
the conformation observed in the crystal structure and assessed the ion 
channel activity. We engineered cysteines at the positions in the lower 
lobes (R2) of the GluN1b and GluN2B ATDs (GluN1b(Ala175Cys)/ 
GluN2B(GIn180Cys) and GluN1b(Lys178Cys)/GluN2B(Asn184Cys)), 
which face each other and should ‘trap the conformation observed in 
the crystal structure by tethering the engineered cysteines with bifunc- 
tional methanthiosulfonate (bi- MTS) reagents (Fig. 2). The distances 
between the mutated residues are closer in apo-GluN 1b-GluN2B ATD 
than in ifenprodil-GluN1b-GluN2B ATD as mentioned above (Fig. 1e). 
When bi-MTS, equal or shorter in length than M4M, binds to the lower 
lobes of the GluN1b-GluN2B heterodimers, we reasoned that the con- 
formation observed in the apo-GluN1b-GluN2B ATD with the open 
GluN2B bi-lobed architecture and the rearranged GluN1-GluN2B 
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Figure 3 | Overall structures of the intact GluN1-GluN2B NMDA 
receptors at different conformational states. a, The crystal structure of 
GluN1a-GluN2B NMDA receptor in complex with glycine, L-glutamate 
and ifenprodil (PDB ID, 4PE5). b-d, Cryo-EM structures of glycine- and 
L-glutamate-bound GluN1b-GluN2B NMDA receptors classified to reveal 
different conformations representing the ‘non-active’ (b, c) and ‘active’ (d) 
states. Subunits: red, GluN1 (a); orange, GluN1 (8); blue, GluN2B (a); 


subunit orientation should be trapped. To test this, we co-expressed the 
cysteine mutants of GluN1b and GluN2B in Xenopus oocytes and probed 
the effect of the bi-MTS reagents on the macroscopic current of NMDA 
receptor by two-electrode voltage clamp. We initialized this experiment 
by testing bi- MTS with the four-carbon linker (M4M in Fig. 2a) as the 
estimated distances between the 1-sulfur atom of the mutated cysteines in 
the GluN1b(Ala175Cys)/GluN2B(Gln180Cys) and GluN 1b(Lys178Cys)/ 
GluN2B(Asn184Cys) mutants of apo-GluN1b-GluN2B ATD are 
~10A and ~9A, respectively, roughly matching the length of M4M. The 
application of M4M to the GluN1b(Ala175Cys)/GluN2B(GIn180Cys) 
and GluN1b(Lys178Cys)/GluN2B(Asn184Cys) mutants potentiates the 
NMDA receptor currents by ~3-4-fold (Fig. 2b, c, Extended Data Fig. 4). 
No such effect is observed when the cysteine mutants of one subunit 
are co-expressed with the wild type of the other subunit, indicating that 
the observed functional effect is specific to the engineered cysteines 
(Fig. 2a and Extended Data Fig. 5a, b). We suggest that this potenti- 
ating effect by the bi-MTS conformational trap favoured the NMDA 
receptor ion channel to reside in the ‘active form. The effect of M4M 
is observed both in the presence and absence of glycine and glutamate, 
indicating that conformational alteration in the ATD is independent 
of agonist binding in the LBD. Furthermore, the potentiation effect 
was also observed when M2M was applied to both of the above mutant 
pairs, indicating that the GluN1b-GluN2B distance in R2 may move 
even closer than observed in the crystal structure, consistent with the 
single-particle electron cryomicroscopy (cryo-EM) structures shown 
later. By contrast, when adding M8M, a bi-MTS agent that is 4-5 A 
longer than the inter-cysteine distances observed in the apo-GluN 1b- 
GluN2B ATD, no potentiating effect was observed, supporting the view 
that the distance between the R2 lobes of GluN1b-GluN2B must be 
reduced during activation (Fig. 2c, Extended Data Fig. 5). Finally, when 
M4M was applied in the presence of ifenprodil, we observe little or no 
potentiating effect indicating that it traps the active conformation of 
GluN1b-GluN2B ATDs but not the inhibited conformation as repre- 
sented by the crystal structure of the ifenprodil-GluN1b-GluN2B ATD 
(Fig. 2b, d). Taken together, these experiments indicate that the protein 
conformation observed in the crystal structure of the apo-GluN 1b- 
GluN2B ATD probably represents the active conformation that facili- 
tates ion channel opening. 
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cyan, GluN2B (3). The two ‘non-active’ 3D classes (non-active 1 and 2) 
have different distances between the two GluN1-GluN2B ATD 
heterodimers represented as the distance between the Ca atoms of Glu320 
(299 in GluN1a) of GluN1b (a) and GluN1b (8) (double-sided arrows). 
The amino and carboxy termini and approximate domain boundaries 

are indicated. e-g, The cryo-EM maps for the ‘non-active 1’ (e), 
‘non-active 2’ (f) and ‘active’ (g) states. 


Structures of intact GluN1b-GluN2B NMDA receptors 
We next investigated how the changes in the GluN1-GluN2B ATD 
conformation alter subunit arrangement and inter-ATD-LBD 
interactions to ultimately mediate gating of the ion channel. 
To answer this, we determined cryo-EM structures of the intact 
heterotetrameric rat GluN1b-GluN2B NMDA receptor ion chan- 
nel in the presence of glycine and L-glutamate and in the absence 
of ifenprodil. The cryo-EM structures were reconstructed at res- 
olutions better than 7A and revealed clear secondary structure 
elements (Fig. 3, Extended Data Figs 6, 7 and Extended Data Table 2). 
The cryo-EM structures show conservation of general features 
observed in the recent full-length NMDA receptor crystal structures, 
including a dimer of GluN1-GluN2B heterodimers arrangement at 
the ATD and LBD layers, the domain swap between the ATD and 
LBD, and pseudo-four-fold symmetrical subunit arrangement at 
the TMD>. Importantly, three-dimensional (3D) classification 
of the cryo-EM data revealed different conformational states pres- 
ent in the data set (Fig. 3). Overall, there are roughly three distinct 
conformations, which we define as ‘non-active 1} ‘non-active 2? and 
‘active’ (Fig. 3). When compared to the crystal structure of the intact 
NMDA receptors bound to ifenprodil, glycine and t-glutamate”””?, 
which represent the allosterically inhibited functional state, all of 
the 3D classes contain a GluN2B ATD open bi-lobed architecture, 
with an ~14°-21° opening similar to the crystal structure of the apo- 
GluN1b-GluN2B ATD. This opening of the GluN2B ATD increases 
the distance between the two GluN1 ATDs by as much as ~29 A in 
the intact NMDA receptor compared to the ifenprodil-bound form 
(Fig. 3). The comparison shows that, upon ifenprodil binding, the 
R1 lobe moves relative to the LBD and TMD to close the bi-lobed 
architecture of the GluN2B ATD, as well as the gap between the two 
GluN1 ATDs to inhibit receptor activity. 

The two 3D classes, non-active 1 and non-active 2, are both in the state 
where agonists are bound to the LBD but the ion channel is closed. When 
focusing on the ATD, both non-active 1 and non-active 2 do not display 
the ~15° rotation of the GluN1b and GluN2B subunits relative to one 
another as observed in the crystal structure of the apo-GluN 1b-GluN2B 
ATD, which represents a conformation that can activate the recep- 
tor. The arrangements of the dimer of the GluN1b-GluN2B ATD 
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Figure 4 | Changes in conformation and heterotetrameric subunit 
arrangement in the ATD during ifenprodil inhibition and receptor 
activation. a-f,Comparison of 3D classes show movements (arrowed arcs) 
of the bi-lobed architecture in the GluN2B ATD (a, d), and rearrangement 
of the GluN1b-GluN2B ATD heterodimer (b, e) and heterotetramer (c, f) 
during transition from the agonist-bound ‘non-active 2’ conformation 
(same colour code as in Fig. 3) to the ifenprodil-bound ‘inhibited’ 
conformation (PDB ID, 4PE5) (grey) (a-c) or to the ‘active’ conformation 


Non-active 2 


dimers (Fig. 4c), as well as the dimer of GluN1b-GluN2B LBD dimers 
(Fig. 5a, c, e) are similar to those observed in the crystal structure of 
the intact NMDA receptor (‘inhibited’ conformation) (Figs 4 and 5). 
Consequently, the ion channel pores at the TMD remain closed, 
confirming that both cryo-EM classes probably represent non-gating 
or ‘non-active’ conformations. The difference in non-active 1 and 
non-active 2 is the extent of bi-lobe opening in the GluN2B ATD, 
where non-active 2 has an ~7° more open conformation resulting 
in ~13A larger separation between the GluN1 ATDs (Fig. 3). Even 
though we tentatively call these two conformations ‘non-active; it 
remains uncertain whether they represent functional states equiva- 
lent to the ‘pre-open’ state observed in non-NMDA receptors!”*! or a 
‘desensitized’ state. 


Active conformation 

One of the cryo-EM classes, ‘active’ (Fig. 3), shows the cleft of the 
bi-lobed GluN2B ATD architecture opened by ~22° and a GluN1b- 
GluN2B heterodimeric subunit rotated by ~12°, compared to the 
ifenprodil-bound intact NMDA receptors, which is notably similar 
to the apo-GluN1b-GluN2B ATD crystal structure representing the 
active ATD conformation (Figs 1 and 4e and Extended Data Fig. 7). 
In the heterotetrameric NMDA receptor, the GluN1b-GluN2B 
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(yellow) (d-f). Schematic diagrams are shown below each panel. 

a, d, Superimposition of GluN2B ATD R1 lobes show relative movement 
of R2 (around black rods) (a, d) and rearrangement in the pattern of 
subunit arrangement (b, e) in different functional states. c, f, GluN1b- 
GluN2B ATD heterotetramer from different 3D classes are compared by 
aligning the centres of masses (COMs) of the ATD heterotetramer, LBD 
heterotetramer and individual LBDs. Ifenprodil shown as green sticks. 


heterodimer pairs rotate by ~12° in opposite directions (Fig. 4f). 
Importantly, this 3D structure of the active conformation of the ATD 
also shows large differences in the subunit arrangement of LBDs 
compared to the other 3D classes representing ‘non-active’ ATDs, 
and is also different from the recent crystal structures of the glycine, 
L-glutamate and ifenprodil complexes*””?. Specifically, when transi- 
tioning from the non-active 2 to active conformation, the two pairs of 
GluN1b-GluN2B LBD heterodimers rotate by ~13.5° (Fig. 5b, d, f). 
These subunit movements in the LBD cause movement of the res- 
idues at the LBD-TMD linkers (Fig. 5). For example, when focus- 
ing on the residues located right above the pore formed by the M3 
TMD helices (Fig. 6a), the consorted movement between the ATD 
and LBD going from non-active 2 to active described above causes 
a vertical movement of GluN1b Arg684 and the lateral separation 
of GluN2B Glu658 by 7 A and 11 A, respectively, to dilate the gat- 
ing ring, a movement that is likely to lead to ion channel gating” 
(Figs 5d, fand 6, Supplementary Videos 1 and 2). Thus, this cryo-EM 
class is structurally and functionally consistent with an ‘active’ 
conformation for GluN1-GluN2B NMDA receptors. Although there 
is clear density for most of the domains in the active conformation 
of the receptor, the density for the TMD is not resolved in sufficient 
detail to directly observe opening of the ion channel, as is the case 
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Figure 5 | Conformational changes at the LBD during ifenprodil 
inhibition and receptor activation. a-f, The same superimposition as in 
Fig. 4c, f showing the LBD tetramers in the ‘non-active 2’ (same colour 
code as in Fig. 3), the ‘inhibited’ (grey) (a, c, e) and the ‘active’ (yellow) 
states (b, d, f) viewed from the ATD (a, b), side (c, d) and TMD (e, f). The 
LBD heterodimers rotates (around black rods) during transition from 
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Figure 6 | Consorted movement of the ATD and LBD opens the gate. 
a, Structural comparison of NMDA receptors in the ‘non-active 2’ (same 
colour code as in Fig. 3) and the ‘active’ (yellow) conformations, as in 
Figs 4 and 5. The M1 and M4 helices of the TMD are omitted for clarity. 
The arrowed arcs indicate rotation from non-active 2 to active. The first 
ordered residues on the linker between the M3 helices on the TMD and 
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LBD in the active structure (GluN1b Arg684 and GluN2B Glu658) are 
shown as spheres. b, c, Schematic diagram viewed from the side of the 
tetramer (b) and top of the ATD (c). GluN1b Arg684 and GluN2B Glu658 
are shown as green spheres and the residues at the ATD-LBD linker 
(GluN1b Ser416 and GluN2B His405) are shown as yellow spheres. 
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for the AMPA receptors’’. This may indicate that the TMD domain 
is structurally more variable in activated receptors compared to non- 
active receptors. Finally, the comparison of the cryo-EM classes with 
GluA2 AMPA receptor in the pre-open state, which represents a closed 
channel!”*!, shows that there is a greater difference between the active 
and pre-open states than between the non-active 2 and pre-open states 
(Extended Data Fig. 8), consistent with our observation that the TMD 
ion channel in the non-active structures are also closed. 


Conclusion 

We report conformational changes in multiple domains that are exper- 
imentally linked to activation of mammalian GluN1b-GluN2B NMDA 
receptors. The activation requires opening of the bi-lobed architecture 
of the GluN2B ATD and reorientation of the heterodimeric arrange- 
ment in the GluN1b-GluN2B ATD, as captured at high-resolution 
by the crystal structure presented here. These changes lead to rotated 
GluN1b-GluN2B heterodimeric pairs in both the ATD and LBD, caus- 
ing dilation of the gating ring. The mechanistic understanding gained 
in the current study represents an important first step in understanding 
the sophisticated activation schemes”*?* that are essential for mam- 
malian NMDA receptor function. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
The experiments were not randomized and the investigators were not blinded to 
outcome assessment. 

Production of GluN1b/GluN2B ATD, GluN1b/GluN2B NMDA receptors and 
Fab fragment. The constructs of GluN1b and GluN2B ATDs are identical to those 
used in our previous study, and were expressed and purified in the same way”!. The 
purified protein was deglycosylated by endoglycosidase F1. Monoclonal antibodies 
(mouse IgG) that bind rat GluN2B ATD were obtained by immunizing mice with 
the purified intact GluN1-GluN2B NMDA receptors using the standard proto- 
col. IgGs were purified from hybridoma cell culture supernatant by Protein-A 
Sepharose (GE healthcare). Fab fragments of the antibody were obtained by papain 
proteolysis followed by re-chromatographing onto Protein-A Sepharose to remove 
the Fc region. The purified GluN1b-GluN2B ATD and Fab were mixed, and the 
ATD-Fab complex was isolated by Superdex200 (10/300; GE Healthcare). The 
intact tetrameric GluN1b-GluN2B NMDA receptors were expressed and purified 
as previously described”. 

Crystallization, data collection and structural determination of apo-GluN1b- 
GluN2B ATD. The purified GluN1b-GluN2B ATD-Fab complex was concen- 
trated to 8mgml | and dialysed against a buffer containing 10mM Tris-HCl 
(pH 8.0) and 50mM NaCl. The crystals were grown at 18°C by the hanging-drop 
vapour diffusion method. GluN1b-GluN2B ATD-Fab complex was mixed with 
a half volume of reservoir solution (3-511 total drop size), which contained 0.1 M 
sodium acetate (pH 4.5), 27% PEG3350, 2.2 M sodium formate, and 0.05 M calcium 
chloride. Cryoprotection was achieved by supplying 8% glycerol to the crystal- 
lization condition. Crystals were flash-frozen in liquid nitrogen. Data sets were 
collected at the wavelength of 1.0 A and at the 23ID-D beamline in the Advanced 
Photon System in the Argonne National Laboratory and processed using HKL2000 
(ref. 35) (Extended Data Table 1). The crystal structure of GluN1b-GluN2B 
ATD-Fab17 complex was solved by molecular replacement using the coordinate 
of GluN1b/GluN2B ATD (PDB ID, 3QEL) and Fab (PDB ID, 1BAF) and by using 
the program Phaser**. The model refinement was performed using the program 
Phenix*”, 

Electrophysiology. GluN1-4b/GluN2B NMDA receptors were expressed by 
injecting CRNAs at a 1:2 ratio (GluN1:GluN2, w/w) into defolliculated Xenopus 
laevis oocytes (0.05-0.15 ng total per oocyte). After 24-48 h incubation at 18°C, 
currents were measured by two-electrode voltage clamp in a solution containing 
5mM HEPES, 100 mM NaCl, 0.3 mM BaCl, and 10 mM Tricine at pH 6.5 (adjusted 
with KOH) using agarose-tipped microelectrode (0.4-1.0 MQ) at the holding 
potential of —60 mV. Currents were evoked by application of 100|1M glycine and 
L-glutamate. For MTS experiments, fresh stock of MTS reagents were made and 
added to the recording buffers at the final concentration of 200,1M. The data were 
analysed using the program Pulse (HEKA) and the graphs were generated by the 
program Kaleidagraph (Synergy). 

Cysteine crosslinking and western blot. Recombinant wild-type and mutant 
GluN1-4b/GluN2B NMDA receptors (GluN2B CTD truncated as in Extended 
Data Fig. 1), were expressed in the Spodoptera frugiperda (Sf9) baculovirus system 
as described previously”. The infected cell pellets were solubilized in a buffer 
containing 50mM HEPES pH 7.3, 200 mM NaCl, 0.5% LMN, and 1mM PMSF. 
The GluN1-4b/GluN2B NMDA receptor proteins were purified by Strep-Tactin 
Sepharose (IBA) and subjected to 7% SDS-polyacrylamide gel electrophoresis 
in the presence and absence of 100 mM (-mercaptoethanol. The proteins were 
transferred to nitrocellulose membranes (GE healthcare). The membranes were 
blocked with 5% milk in a phosphate saline buffer containing 0.05% Tween-20, 
incubated with mouse monoclonal anti-GluN1 antibody (MAB1586, Millipore) 
or anti-GluN2B antibody (AB93610, Abcam), followed by horseradish peroxidase 
(HRP)-conjugated anti-mouse secondary antibodies (GE healthcare). The ECL 
detection kit (GE healthcare) was used to visualize bands. 

Cryo-EM specimen preparation and image acquisition. Purified GluN1b/ 
GluN2B NMDA receptor at 2mg ml! was placed on C-flat 1.2/1.3 Cu 400 mesh 
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grids (Protochips), which had previously been subjected to glow discharge for 45s 
at 15 mA, and plunge-frozen using an FEI Vitrobot Mark 2 with a 3s blot time and 
at relative humidity between 85% and 95%. The data were collected on an FEI Titan 
Krios microscope operating at 300 kV. 1,204 movies were collected on a Gatan 
K2 Summit direct electron detector (Gatan, Inc.) in super resolution mode with 
a pixel size of 0.655 A per super resolution pixel. Each exposure was 21s long and 
recorded as a movie of 70 frames. The exposure per frame as reported by Digital 
Micrograph (Gatan, Inc.) was ~1.4e~ per A’, which corresponds to an exposure of 
~8 electrons per pixel per second on the camera. Videos were recorded at a range 
of underfocus between ~1.0 jum and ~2.5 1m. 

Image processing. Super-resolution movie frames were initially corrected for mag- 
nification distortion*®. The frames were then downsampled by a factor of 2 using 
Fourier cropping to a pixel size of 1.31 A, motion-corrected and exposure filtered 
using Unblur® and the microscope CTF was determined using CTFFIND4” on 
motion-corrected but non-exposure filtered movie sums. Around 90,000 particles 
were picked automatically then verified manually from the aligned movie sums 
which had been exposure filtered, but not noise restored, resulting in a strong low- 
pass filter. The picked particles were extracted into 256 x 256 boxes. Initial particle 
alignment parameters were assigned by a brute force search in Frealign v9*', sam- 
pling every 5° and limiting the resolution to 15 A using a previously determined 
structure as a reference. These parameters were further refined and classified into 
six 3D classes with Frealign. For classes 1, 3 and 6, the highest resolution included 
in the alignment was 8 A, for class 4 the highest included resolution was 12 A, 
and for class 5 it was 6.5 A. Class 2 showed only low-resolution features and was 
discarded. The resulting resolutions as determined by the 0.143 cut-off were 
5.0-6.7 A (Extended Data Fig. 6). Maps were rendered using UCSF Chimera’, 
after applying a bfactor of —600 A”. 

Model building. The GluN1a-GluN2B crystal structure (PDB ID, 4PE5)” was 
docked into the cryo-EM maps followed by rigid-body fitting of the individual 
ATD R1 and R2 lobes and LBDs of both GluN1 and GluN2B into the cryo-EM 
maps using Coot“. Both the rat GluNla-GluN2B crystal structure (PDB ID, 
4PE5)” and Xenopus GluN1-GluN2B NMDA receptor (PDB ID, 4TLM)”? were 
used to model the TMD. The resulting models were manually modified to fit into 
the density using Coot and refined against the cryo-EM maps using Phenix real 
space refinement*®. Refinement statistics are shown in Extended Data Table 2. 
Class X and class Y are similar to ‘non-active 2° 
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Extended Data Figure 1 | Domain organization and constructs. the previous study on the ATD™*. b, The construct design for the intact 
a, The construct design for GluN1b and GluN2B ATD used in this study. GluN1b/GluN2B NMDA receptors from rat. A similar construct was used 
GluN1b from Xenopus laevis is combined with GluN2B from rat, as in in previous studies and shown to be fully functional”. 
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Extended Data Figure 2 | Structure of the apo-GluN1b-GluN2B ATD. 
a, Representative 2F, — F. electron density map contoured at 1.20 showing 
continuous density throughout GluN1b, GluN2B and Fab. The quality of 
the electron density map is at a sufficient level to model amino acid side 
chains (see lower panel). b, Crystal packing of GluN1b-GluN2B ATD-Fab 
showing that the packing is mediated robustly by Fab molecules (green). 


GluN1b GluN2B 


The colour coding for the ATD is the same as in Fig. 1. c, Comparison of 
the apo-GluN1b-GluN2B ATD and ifenprodil-GluN1b-GluN2B ATD 
(grey) by stereo presentation. Colour coding for the apo-GluN1b-GluN2B 
ATD is the same as in Fig. 1. Here the two structures are superimposed at 
GluN2B RI. 
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Extended Data Figure 3 | Validation of the crystal structure by disulfide 
cross-linking. a, Crystal structure of the apo-GluN1b-GluN2B ATD 
showing locations of the mutated residues, GluN1b Phe113, GluN1b 
Gly331, GluN2B Ala107 and GluN2B Glu75 in spheres. b, Western 

blots using anti-GluN1 (left) and anti-GluN2B (right) antibodies on 
purified intact GluN1b/GluN2B NMDA receptor that lacks the CTD. 


GluN2B 


E75C GluN2B 


-BME 


+BME 


anti-GluN2B 


Upper and lower panels are blots run in the absence and presence of 
6-mercaptoethanol (8ME), respectively. Bands highlighted by arrow 1 are 
consistent with the molecular weight of GluN1-GluN2B heterodimers, 
whereas those highlighted by arrows 2 and 3 are consistent with the 
molecular weights of monomers of GluN1-4b and GluN2B. 
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Extended Data Figure 4 | Conformational trap shows the apo-GluN1b- _b, Application of 200;1.M M4M in the presence or absence of 100 1M 
GluN2B ATD structure to represent ‘active’ form-II. a, Location of agonists (glycine (gly)/glutamate (glut)) potentiates the macroscopic 
engineered cysteines in the crystal structure of the apo-GluN1b-GluN2B current measured at the holding potential of —60 mV by two-electrode 
ATD. The cysteine mutant pairs, GluN1—4b(Alal75Cys)/GluN2B(Gln180Cys) voltage clamp. No potentiation was observed when M4M was applied 


(green spheres) and GluN1-4b(Lys178Cys)/GluN2B(Asn184Cys) in the presence of ifenprodil (ifen). Shown here are the representative 
(blue spheres) are co-expressed in Xenopus oocytes and cross-linked by recording profiles for the GluN1-4b(Lys178Cys)/GluN2B(Asn184Cys) 
bifunctional MTS with different linker lengths (M2M, M4M and M8M). pair. 
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Extended Data Figure 5 | Effect of bi-MTAs on cysteine mutants. 

a, b, M4M specifically traps the active conformation at the engineered 
cysteines. Representative electrophysiological traces for the mutant 

pairs, GluN1-4b(Ala175Cys)/GluN2B(Gln180Cys) (green spheres) 

and GluN1-4b(Lys178Cys)/GluN2B(Asn184Cys) as well as mutant and 
wild-type pairs. The experiments were conducted by two-electrode voltage 
clamp as in Fig. 2. The potentiation by M4M (represented by Iyrs/Jo) is 
only observed when both GluN1 and GluN2B cysteine mutants are 
co-expressed. No potentiation was observed when the cysteine mutant of 
one subunit is combined with the wild type of the other, indicating that the 


effect of M4M modification is specific and validating the relevance of the 
experiments. c, d, Bar graphs presenting the degree of potentiation from 
the recordings in a and b. Error bars represent +s.d. for data obtained 
from five different oocytes per mutant combination. e~g, M2M but not 
M8M potentiates the mutant GluN1b-GluN2B NMDA receptor. The same 
experiment as above or in Fig. 2 was conducted using M2M or M8M on 
GluN1-4b(Ala175Cys)/GluN2B(Gln180Cys) (e) GluN1-4b(Lys178Cys)/ 
GluN2B(Asn184Cys) (f), and wild-type GluN1-4b/GluN2B (g). Shown are 
representative electrophysiological recordings used to estimate the degree 
of bi-MTS potentiation presented in Fig. 2c. 
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Extended Data Figure 6 | Cryo-EM analysis on GluN1b/GluN2B 
NMDA receptors. a, Representative motion-corrected image collected 
at 22,500 magnification. b, Two-dimensional class averages. c, d, Fourier 
shell correlation curves for unmasked data (c) and model versus 
electron microscopy map (d). Class X and Y are similar to ‘non-active 2’ 
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and were not further analysed. e, Orientation plots for each class, plotting 
the distribution of Euler angles assigned to all particles contributing to 
that class with an occupancy of at least 80%. For each class, the number 
of particles which have that class as their highest occupancy value is also 
shown. 
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Extended Data Figure 7 | Representative cryo-EM density, model 

fit and structural comparison of the ATD in inhibited and active 
conformations of cryo-EM structures. a-e, Here, the cryo-EM maps 
for non-active 2 and active 3D classes are shown along with the refined 
models. Densities are shown at the ATD and LBD (a-d) for both of the 
3D classes and at the TMD (e) for non-active 2. f, g, Superimposition of 
R1 lobes of GluN1b (f) and GluN2B (g) illustrates the relative ‘opening’ 
between R1 and R2 lobes in the inhibited and active forms of intact 
NMDA receptors. The extent of GluN2B ATD opening is similar to that 
observed between the crystal structures of the ifenprodil-GluN1b-GluN2B 
ATD and the apo-GluN1b-GluN2B ATD as in Fig. 1. GluN1b and 


GluN2B ATDs are shown in grey and yellow for the inhibited and 

active states, respectively. h, Comparison of the GluN1b-GluN2B ATD 
heterodimers between ifenprodil inhibited and active cryo-EM structures. 
Superimposition of the GluN2B R1 lobes reveals an ~12° rotation of the 
GluN 1b ATD relative to the GluN2B ATD in the similar manner to the 
crystal structure of the apo-GluN1b-GluN2B ATD as in Fig. 1. The black 
rods indicate the axis of rotation between the two cryo-EM structures. 
The distance of the R2 lobes in the GluN1b-GluN2B heterodimers is 
measured between Ca atoms of GluN1b(Lys178) and GluN2B(Asn184) 
(green spheres). 
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Extended Data Figure 8 | Structural comparison of the GluN1-GluN2B 
LBD in non-active and active conformations to the GluA2 LBD in 
pre-open state. a—d, The crystal structure of GluA2 AMPA receptor in 
the pre-open state (PDB ID, 4U1W; shown in green) aligned with the 
structures of GluN1-GluN2B in the non-active 2 (blue) (a, b) and active 
conformation (yellow) (c, d) by superimposing the LBDs of GluN2B onto 
GluA2. e-h, The equivalent superimposition with the cryo-EM structure 
of GluA2 AMPA receptor in the pre-open state (PDB ID, 4UQ6; shown in 
green). The overlaid structures are viewed through the LBD heterodimer 
interface (a, c or e, g) and the dimer of heterodimer interface (b, d or f, h). 
Here, the GluN2B LBD of the GluN1b-GluN2B NMDA receptor is 
superimposed onto the LBD of the GluA2 AMPA receptor and the shift of 


the GluN1 LBD is measured with respect to the other GluA2 LBD. 

The homodimeric arrangement of GluA2 AMPA receptor in the pre-open 
state is similar to the heterodimeric arrangements of GluN1b-GluN2B 
NMDA receptors in both non-active 2 and active states (a, c or e, g). 
However, when the dimer of homodimer arrangement of GluA2 AMPA 
receptor is compared to the dimer of heterodimers arrangement of the 
GluN1b-GluN2B NMDA receptor, a greater difference is observed for the 
active NMDA receptor (d, h) than for the non-active 2 NMDA receptor 
(b, f). Here, the non-active 2 NMDA receptor as in Fig. 3 is subjected to 
superimposition. The non-active 1 and non-active 2 NMDA receptors 
have similar subunit arrangements in the LBD. The numbers in each panel 
represent degrees of rotations and translations. 
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Extended Data Table 1 | Data collection and refinement statistics for X-ray crystallography 


Data collection 
Space group 
Cell dimensions 
a, b,c (A) 
a, B, y (°) 
Resolution (A) 


Reserve 

I/ol 
Completeness (%) 
Redundancy 


Refinement 
Resolution (A) 
No. reflections 
R work! Riree 
No. atoms 
Protein 
Ion (Na) 
Water 
B-factors 
Protein 
Ligand/ion 
Water 
R.m.s deviations 
Bond lengths (A) 
Bond angles (°) 


*Highest resolution shell is shown in parentheses. 


Apo- 
GluN1b/GluN2B 
ATD — Fab17 

C2 


247.4, 80.0, 181.4 
90.0, 127.2, 90.0 
50-2.90(2.93-2.90) * 
0.099 (0.602) 

8.5 (1.84) 

91.4 (93.0) 

4.0 (3.5) 


30-2.9 
52,910 
0.273/0.302 


15,899 
1 
112 


46.3 
52,1 
37.9 


0.008 
1.068 
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Extended Data Table 2 | Refinement statistics for the cryo-EM structures 
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Active Non-Active 1 Non-Active 2 Class X Class Y 
PDB ID 5FXG 5FXH 5FXI 5FXJ 5FXK 
EMDB ID EMD-3352 EMD-3353 EMD-3354 EMD-3355 EMD-3356 
Refinement 
Resolution (A) 6.8 6.1 6.4 6.5 6.4 
Map to Model CC 0.77 0.79 0.80 0.78 0.80 
No. atoms 
Protein 12,693 15,578 15,381 15,475 15,598 
R.m.s deviations 
Bond lengths (A) 0.003 0.006 0.003 0.004 0.005 
Bond angles (°) 0.55 0.55 0.60 0.58 0.55 
Ramachandran 
Favored (%) 88.3 90.4 90.2 88.0 88.8 
Allowed (%) 11.5 9.2 9.4 11.4 10.7 
Disallowed (%) 0.2 0.5 0.4 0.6 0.5 
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Structure of spinach photosystem II- 
LHCII supercomplex at 3.2 A resolution 


Xuepeng Wei!?*, Xiaodong Su!*, Peng Cao!, Xiuying Liu)’, Wenrui Chang!, Mei Li!, Xinzheng Zhang! & Zhenfeng Liu! 


During photosynthesis, the plant photosystem II core complex receives excitation energy from the peripheral light- 
harvesting complex II (LHCII). The pathways along which excitation energy is transferred between them, and their 
assembly mechanisms, remain to be deciphered through high-resolution structural studies. Here we report the structure 
of a 1.1-megadalton spinach photosystem II-LHCII supercomplex solved at 3.2 A resolution through single-particle cryo- 
electron microscopy. The structure reveals a homodimeric supramolecular system in which each monomer contains 
25 protein subunits, 105 chlorophylls, 28 carotenoids and other cofactors. Three extrinsic subunits (PsbO, PsbP and 
PsbQ), which are essential for optimal oxygen-evolving activity of photosystem II, form a triangular crown that shields 
the Mn,CaO;-binding domains of CP43 and D1. One major trimeric and two minor monomeric LHCIlIs associate with 
each core-complex monomer, and the antenna-core interactions are reinforced by three small intrinsic subunits (PsbW, 
PsbH and PsbZ). By analysing the closely connected interfacial chlorophylls, we have obtained detailed insights into the 
energy-transfer pathways between the antenna and core complexes. 


Powered by solar energy, plants, algae and cyanobacteria convert water 
and carbon dioxide into organic matter and release oxygen through 
photosynthesis. In oxygenic photosynthesis, the initial photophysical 
and photochemical processes are primarily mediated by two photo- 
systems: photosystems I (PSI) and II (PSII)!. PSII is a supramolecular 
complex embedded within the thylakoid membrane. It contains numer- 
ous protein subunits and various cofactors, including chlorophylls, 
carotenoids, an Mn4CaOs cluster, a haem, plastoquinones and lipids’. 
A characteristic functional feature of PSII is its ability to extract electrons 
from water molecules through a light-induced water-oxidizing reaction 
catalysed by the Mn4CaOs cluster’. To collect photon energy and drive 
the photochemical reactions, plant PSII contains a series of peripheral 
light-harvesting complexes (the major light-harvesting complexes of 
PSI (LHCII) and minor ones named chlorophyll-binding protein of 29, 
26 and 24 kDa (CP29, CP26 and CP24))*°. These antenna complexes 
surround the core complex of PSI, absorb light energy and transmit it 
to the reaction centre to induce charge separation in the special pair of 
chlorophylls named P680 (ref. 6). 

The highest resolution at which the crystal structure of cyanobac- 
terial PSII has been solved is 1.9A (refs 7, 8), after a decade-long opti- 
mization process starting from 3.8 A resolution (reviewed in ref. 2). 
The structures of cyanobacterial PSII provide solid foundations for 
understanding the pathways of excitation energy transfer, electron 
transport and water splitting processes occurring within the complex. 
Although plant PSII has a core complex similar to that of cyanobacterial 
PSII, there are major differences in their luminal extrinsic domains 
and peripheral antenna systems. The structures of partial and intact 
core complexes of plant PSII have been solved through cryo-electron 
crystallography at 8-10 A resolution®"!!, and X-ray structures of 
isolated LHCII and CP29 are available at 2.5-2.8 A resolution!?"*. 
Furthermore, a 3D map of the PSII-LHCII supercomplex at 17 A 
resolution was obtained through single-particle cryo-electron micros- 
copy (cryo-EM)!°, and 2D projection maps of larger supercomplexes 
with more antenna complexes bound have been reported at 12-13 A 
(refs 16, 17). Nevertheless, the precise pathways of excitation energy 


transfer between the peripheral antennae and core complex of plant 
PSII remain largely unclear owing to the lack of a high-resolution struc- 
ture of the PSU-LHCI supercomplex. Moreover, the structural roles 
and mutual interactions of three important major extrinsic subunits 
(PsbO, PsbP and PsbQ) in plant PSII are unknown. Here we present 
a3.2A resolution cryo-EM structure of spinach PSII in complex with 
LHCH, CP29, CP26 and four extrinsic proteins. Unprecedented details 
concerning the specific interactions between different components 
within the supercomplex are revealed. 


Overall architecture 

The PSII-LHCI supercomplex sample for the cryo-EM study was puri- 
fied from spinach leaves. Its spectroscopic features, protein composition 
and pigment content analysis results are summarized in Extended Data 
Fig. 1. From a total of 1,774 cryo-EM micrographs collected (Fig. 1a), 
192,071 particles were picked for further data processing. After 2D and 
3D classification (Fig. 1b), 109,042 C,S2-type (C: PSII core complex; 
S: strongly associated LHCII trimer) particles were selected and processed 
with local motion correction and image polishing, leading to a cryo-EM 
map at an overall resolution of 3.2 A (Fig. 1c, dand Extended Data Fig. 2; 
see Methods for more details). The actual resolution within the C2S5- 
type PSII-LHCII supercomplex varies from 3.0 A in the core region to 
4.0 A in some of the peripheral regions (Extended Data Fig. 2b). An 
atomic model of the PSII-LHCII supercomplex has been constructed 
and refined against the 3.2 A cryo-EM map (Extended Data Fig. 2c; see 
Methods for details). 

As shown in Fig. 2, the spinach CS -type PSII-LHCII supercom- 
plex forms a homodimer with two-fold symmetry running through 
the centre along the membrane normal. The dimerization interface in 
the core region closely resembles those of cyanobacterial PSIIs”'*!°, 
and LHCII and CP29 at the peripheral regions extend the interface 
and enhance the stability of the dimeric supercomplex. Each mono- 
mer contains a core complex composed of four large intrinsic subunits 
(D1, D2, CP43 and CP47), twelve low-molecular-mass membrane- 
spanning subunits (PsbE, PsbF, PsbH, Psbl, PsbJ, PsbK, PsbL, PsbM, 
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Figure 1 | Single-particle cryo-EM analyses of spinach PSII-LHCII 
supercomplex. a, A representative cryo-EM image of a spinach PSII- 
LHCII supercomplex sample. b, Selected 2D class averages of C,S)-type 
PSII-LHCII supercomplex particles. c, d, 3D cryo-EM density map of the 
PSII-LHCII supercomplex. c, Side view along membrane plane. d, Bottom 
view from luminal side along membrane normal. One monomer of the 
supercomplex is shown in a four-colour mode with the PSII core in yellow, 
LHCII in green, CP26 in purple and CP29 in orange; the other monomer is 
shown in a single grey colour. 


PsbTc, PsbW, PsbX and PsbZ) (Fig. 2a—c), and four extrinsic subu- 
nits attached on the luminal surface (PsbO, PsbP, PsbQ and PsbTn) 
(Fig. 2d). The major intrinsic subunits (D1, D2, CP43 and CP47) of 
spinach PSII core share high similarities with those of cyanobacterial 
PSI. The amino acid sequences of spinach D1 (PsbA), D2 (PsbD), 
CP47 (PsbB) and CP43 (PsbC) proteins are 85%, 90%, 77% and 84%, 
respectively, identical to those from Thermosynechococcus vulcanus 
photosystem II (TvPSII) (ref. 7). Their structures can be superposed 
on those of TvPSII with root mean square deviations (r.m.s.d.) of 
a-carbon atoms at 0.57 (D1), 0.64 (D2), 0.68 (CP47) and 0.69 A (CP43), 
respectively (Extended Data Fig. 3a). Moreover, the cofactor binding 
sites within the core complex are well conserved between the two 
species. 

Surrounding the four major core subunits, twelve low-molecular- 
mass intrinsic subunits form a discontinuous belt-like structure wrap- 
ping around the core (Fig. 2a and Extended Data Fig. 3b, c). Among 
them, eleven can find their homologues on similar binding sites in 
cyanobacterial PSII (ref. 7), whereas PsbW is present only in higher 
plants and algae but not in cyanobacteria”®. In red algal PSII (CcPSII 
from Cyanidium caldarium)*', a subunit found at a location adjacent 
to PsbI and named ‘chain S’ may correspond to spinach PsbW. PsbZ 
is the only subunit with two transmembrane helices and its N and C 
termini are both positioned on the luminal surface, whereas the other 
11 subunits all have a single transmembrane helix. The N termini of 
the PsbE, PsbF, PsbL, PsbJ and PsbH subunits are located at the stromal 
surface, whereas PsbI, PsbK, PsbM, PsbTc, PsbW and PsbX assume a 
reverse topology with their N termini positioned on the luminal side. 
These small intrinsic subunits are involved in dimerization of the core 
complex (PsbTc, PsbL and PsbM), stabilization of the core (PsbK, PsbJ, 
PsbE, PsbF and PsbX), mediating the association of peripheral antenna 
complexes with the core complex (PsbW, PsbZ and PsbH), and binding 
cytochrome bss59 to protect PSII from photodamage (PsbE and PsbF)”. 

At the outer region of the core, one LHCII trimer and one CP26 
monomer flank the side near CP43, and one CP29 monomer is asso- 
ciated with CP47 on the other side (Fig. 2a, c). Within each monomer 
of the dimeric supercomplex, a large number of cofactors have been 
located, as summarized in Extended Data Table 1. They include 105 
chlorophyll (Chl) molecules, 28 8-carotene and xanthophylls, one 
haem, one Mn4CaO; cluster, one plastoquinone and numerous lipid 
molecules. Whereas the core subunits (CP43, CP47, D1 and D2) bind 
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Figure 2 | Overall architecture of spinach PSII-LHCII supercomplex. 
a, b, Structure of the spinach C,S)-type PSU-LHCII supercomplex. 

a, View from the stromal side along membrane normal. b, Side view along 
membrane plane. Dashed lines indicate estimated interfacial regions 
between the two monomers. The major components are shown as cartoon 
and stick models in different colours and the 12 small intrinsic subunits 
are shown as yellow sphere models. c, Cartoon diagram showing the 
assignment of membrane-embedded subunits of the supercomplex. Only 
one monomer is shown and the colour codes are consistent with those in 
a. d, Lumen-exposed regions of the supercomplex. The view is along the 
membrane normal from the luminal side. 


only Chl a and $-carotene molecules, LHCII, CP29 and CP26 contain 
both Chl a and Chl 8, and three different xanthophylls (lutein, neox- 
anthin and violaxanthin)*!*”°. 


The extrinsic subunits 

On the luminal side of the core complex, we have located the binding 
sites for four extrinsic subunits, namely PsbO, PsbP, PsbQ and PsbTn 
(Fig. 3a, b and Extended Data Fig. 4a). Among them, PsbO, PsbP and 
PsbQ form a triangular crown-like structure encircling the luminal 
domain of CP43 and the C-terminal tail of D1 (Fig. 3b, c). These parts 
of CP43 and D1 are directly involved in coordinating and providing 
shields for the Mn4CaO; cluster, the oxygen-evolving centre responsible 
for splitting water into oxygen and protons’. By interacting with the 
two Mn,CaO;-binding regions, the heterotrimeric PsbO-PsbP-PsbQ 
complex serves to optimize the efficiency of oxygen evolution in PSII 
under physiological conditions**. Spinach PsbO has a characteristic 
6-barrel structure similar to its homologue in TvPSII with an r.m.s.d. 
of a-carbon atoms at 1.42 A. Near the 87-88 loop region of PsbO, PsbP 
binds in a canyon between the luminal domains of CP43 and CP47 
(Fig. 3b). A short loop between Asp137 and Glu140 of PsbP stabi- 
lizes the C-terminal tails of D1 and D2 (Fig. 3c). The binding site of 
PsbP partly overlaps with those of PsbV and PsbU in TvPSII (ref. 7) 
and CcPSII (ref. 21) (Extended Data Fig. 4b). On the other side, the 
N-terminal region of spinach PsbO binds to the four-helix bundle 
domain of PsbQ, and PsbQ simultaneously interacts with the luminal 
domain of CP43 (Fig. 3a). In CcPSII (ref. 21), a PsbQ’ protein resem- 
bling spinach PsbQ was found in a similar location (Extended Data 
Fig. 4b). Curiously, the elongated N-terminal region of spinach PsbQ 
reaches out approximately 50 A away from the four-helix bundle and 
binds to along loop of PsbP between Lys90 and Ala111. The interaction 
between PsbP and PsbQ was previously detected through a crosslinking 
method combined with mass spectrometry”>”°. Marked conforma- 
tional changes occur in the flexible regions of PsbP and PsbQ when 
they bind to PSII core subunits (Extended Data Fig. 4c, d). Besides 
them, the smallest subunit of plant PSII with unknown function, PsbTn 
(nuclear encoded PsbT subunit)”’, intercalates between the luminal 
domain of CP47 and the C-terminal region of PsbE, serving as a bridge 
between them (Fig. 3b). 
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Figure 3 | The extrinsic subunits of spinach PSII. a, Cartoon 
representation of the extrinsic subunits surrounding the oxygen-evolving 
centre. The figure is shown as a stereo pair viewed along the membrane 
plane. The peripheral antenna complexes are omitted for clarity. The four 
extrinsic subunits (PsbO, PsbP, PsbQ and PsbTn) are shown as ribbon 
models; the intrinsic subunits of the PSII core are presented as surface 
models. b, The location of the four extrinsic subunits viewed from the 
luminal side. The dashed-line triangle indicates the PshO-PsbP-PsbQ 
tertiary complex, which forms a crown-like structure encircling the 
luminal domain of CP43 and D1. ¢, The role of a loop from PsbP in 
stabilizing the C-terminal regions of D1 and D2. The protein backbones 
are shown as ribbons; the Mn4CaOs cluster is shown as a sphere model; 
and the amino acid residues involved in coordinating the cluster are 
presented as stick models. The segment of loop between Asp137 and 
Glu140 in PsbP involved in contacting the C-terminal tails of D1 and D2 
subunits is highlighted as a stick model. 


Structures of peripheral antennae 

The cryo-EM structure of LHCII within the supercomplex is nearly 
identical to the previous crystal structure of isolated LHCII’? (Extended 
Data Fig. 5). The densities and structures of CP29 and CP26 are shown 
in Extended Data Figs 6 and 7. The binding sites and identities of chlo- 
rophylls and carotenoids in LHCII, CP29 and CP26 are summarized 
in Extended Data Table 2. The chromophore identities were assigned 
according to the appearance of cryo-EM densities (Extended Data 
Fig. 8a), previous high-resolution crystal structures of spinach LHCII 
(ref. 12) and CP29 (ref. 14), and the functional architecture of CP26 
(ref. 23). 

For CP29, three carotenoid and thirteen chlorophyll binding sites 
were located. Despite their overall similarity, there are evident dif- 
ferences between the cryo-EM structure of full-length CP29 and the 
crystal structure of CP29 without its N-terminal domain’ (Extended 
Data Fig. 6a, b). The long N-terminal region (87 amino acid residues) of 
CP29 was unobserved in the previous crystal structure owing to its high 
flexibility and proteolysis during crystallization’. An earlier work using 
electron paramagnetic resonance approaches suggested that this region 
is potentially structured”®. The cryo-EM structure of CP29 shows that 
this region forms two motifs with irregular coil structures (motifs I 
and II) (Extended Data Fig. 6b). Motif I (Prol2—Lys41) superposes 
well with the corresponding N-terminal region of LHCII (Extended 
Data Fig. 6c). A chlorophyll density resembling Chl b601 of LHCII is 
observed in this region (Extended Data Fig. 6c, d). It is located near 
Chl a611 with a Mg-to-Mg distance of 12.0 A and is tentatively assigned 
as a Chl a. Its central ligand is contributed by the carbonyl of Trp14 
from the N-terminal region of CP29. Motif II (Pro42—Phe87) forms 
an L-shaped structure containing an approximately 40 A-long hairpin 
loop (Pro42-Ser72) running nearly parallel to the stromal surface, 
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Figure 4 | Antenna-core and antenna-antenna interactions in the 
PSII-LHCII supercomplex. a, The interface between LHCII and 
CP43/D1. Their interactions are mediated by PsbW and PsbI. PsbW 
(CTR), C-terminal region of the PsbW subunit. b, Interactions between 
CP29 and CP47/D1. M4-5 loop, loop region between the fourth and 
fifth transmembrane helices of D1; M3 and M4, third and fourth 
transmembrane helices of CP47, respectively; M2-3 loop, loop between 
the second and third transmembrane helices of CP47; PsbH (NTR), 
N-terminal region of the PsbH subunit. c, Interactions between CP26 
and CP43/PsbZ. d, Contacts between LHCII and CP29’. e, CP26-LHCII 
interactions. 


and a short hairpin loop (Ala73-Phe87) beneath the long hairpin 
(Extended Data Fig. 6b). Strikingly, a new chlorophyll-binding site is 
found attached to the short hairpin in motif II and located at the inter- 
facial region facing CP47 (Extended Data Fig. 6e). This chlorophyll 
is coordinated by the backbone carbonyl of Leu80 and is tentatively 
assigned as a Chl a with its binding site named 616. Chl a616 in CP29 
superposes partly with Chl 4617 in Lhca3 and Lhca4 of the PSI-LHCI 
supercomplex”**° (Extended Data Fig. 6f), suggesting that these chlo- 
rophylls may serve similar roles as interfacial chlorophylls facilitating 
energy transfer between two adjacent antenna complexes. 

For CP26 in the PSII-LHCII supercomplex, thirteen chloro- 
phyll binding sites and three carotenoid binding sites were observed 
(Extended Data Table 2 and Extended Data Fig. 7). Among them, Chls 
b601, a604, b607 and b608 were not predicted by the previous func- 
tional study’, as their central ligands are contributed by either a protein 
backbone carbonyl group (601) or potential water molecules (604, 607 
and 608). Their identities are tentatively assigned according to their 
similarity to the corresponding sites in LHCII. The carotenoids are 
assigned as two luteins (L1 and L2) and one neoxanthin (N1). Although 
the L1 site is mainly occupied by lutein, the L2 site can also accept 
violaxanthin as well as lutein”, and the N1 site binds neoxanthin*!. 


Antenna-PSII core assembly 

Efficient transfer of excitation energy from the peripheral antenna com- 
plexes to the PSII core relies on their specific non-covalent interactions. 
The association of the LHCII trimer with the core complex is mediated 
by PsbW (Fig. 4a) and the interfacial lipid molecules (Extended Data 
Fig. 8b). On the stromal side, the LHCII trimer binds to PsbW through 
hydrophobic interactions between its Chl a611-a612 pair and Trp117/ 
Phe121 from PsbW. On the luminal side, Asn88 from LHCII is hydro- 
gen-bonded to Trp107 and Asn103 on PsbW. Meanwhile, Leu84, yc 
forms van der Waals contacts with Trp107p.pw. To further connect 
LHCII with the core, the transmembrane helix of PsbW forms exten- 
sive hydrophobic interactions with PsbI located on the side opposite 
to the LHCII binding site. PsbI simultaneously interacts with the first 
transmembrane helix of the D1 subunit and PsbW. In addition, the 
N-terminal region of PsbW extends to the luminal surface and interacts 
with PsbO and D1, while its C-terminal region is located on the stromal 
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surface and binds to the loop region between the fourth and fifth trans- 
membrane helices of CP43. In PsbW knockout plants, the PSII-LHCII 
supercomplexes are destabilized and could not be detected”°. The loca- 
tion of PsbW at the interface between the LHCII trimer and the core 
complex explains its essential role for the formation of the PSII-LHCII 
supercomplex. In addition, numerous lipid-like densities are found at 
the interfaces between PsbW and LHCII and between CP43 and LHCII, 
enhancing the association of LHCII with the core complex (Extended 
Data Fig. 8b). 

Both CP29 and CP26 contribute to the proper assembly and stability 
of the PSII-LHCII supercomplex*”*. The transmembrane domain of 
CP29 directly associates with that of CP47 through hydrophobic inter- 
actions. Notably, Chl a616, the Chl 2603 and a609 pair and Chl b607 
from CP29 make several contacts with hydrophobic residues from 
the third and fourth transmembrane helices of CP47 (Fig. 4b). Above 
Chl 2616, the long hairpin loop in motif II of CP29 binds to the stromal 
surface of CP47 through hydrogen bonds and van der Waals interac- 
tions. This loop further reaches out to contact the Thr228-Asn230 
region on a surface loop of the D1 subunit. Moreover, the elongated 
N-terminal region of the PsbH subunit forms a loop-helix—loop struc- 
ture and intercalates between motif II of CP29 and the loop between the 
second and third transmembrane helices of CP47. Thus, PsbH secures 
the interactions between CP29 and CP47 (Fig. 4b). 

CP26 interacts specifically with CP43 and PsbZ through its 
N-terminal and C-terminal regions as well as the chlorophylls 
(Chl b601 and Chl a614) bound to these two regions (Fig. 4c). The 
N-terminal region of CP26 (Pro36—Leu39) and Chl b601 (coordinated 
by Phe34 in this region) form van der Waals contacts and hydrophobic 
interactions with Chl 4513 from CP43. The C-terminal region of 
CP26 contains a short amphipathic a-helix between Leu230 and 
Tle234 (named helix F), and a loop named the DF loop (Pro225- 
Asn229) preceding helix FE. The DF loop forms van der Waals con- 
tacts with Phe182 from CP43, and is further bridged to the luminal 
surface of CP43 through interfacial lipid molecules (Extended Data 
Fig. 8b). Helix F is sandwiched between helix Acp2¢ and the second 
a-helix of PsbZ (Fig. 4c). It binds to the C-terminal region of PsbZ 
through hydrophobic interactions and a hydrogen bond (Leu23 1 cp26- 
Ser59pspbz). The second helix of PsbZ is positioned at the interface 
between CP26 and CP43, and the first helix binds to CP43 on its mem- 
brane-facing surface and interacts closely with the second helix so as to 
provide rigid support for it. Thereby, PsbZ reinforces the association 
of CP26 with CP43. When psbZ is knocked out, the content of CP26 
protein decreases markedly and the formation of the PSII-LHCII 
supercomplex is deficient** °° 


Interactions between LHCII, CP29 and CP26 

The LHCII trimer serves as a bridge connecting CP26 with CP29 from 
the adjacent monomer (CP29’) of the dimeric supercomplex (Fig. 2a). 
The three monomers of the LHCII trimer within the supercomplex are 
not equivalent, as they are surrounded by different neighbours. Two of 
them, monomers A and B, interact with CP29’ and CP26, respectively, 
whereas the third monomer (C) is located at the peripheral region 
(contacting the moderately associated LHCII in the larger C2.S2M2 
supercomplex?®). As shown in Fig. 4d, e, monomers A and B are related 
to the adjacent CP29’ and CP26, respectively, through pseudo-C2 sym- 
metry running through their interfaces. Monomer A of LHCII forms 
several contacts with CP29’ on both the stromal and luminal sides 
(Fig. 4d). On the stromal surface, Pro163 from the AC loop (between 
helices A and C) region of LHCII interacts with the trimethylcyclohexane- 
1,3-diol head group of neoxanthin from CP29’. Meanwhile, the neox- 
anthin from LHCII contacts Pro180 from the AC loop region of CP29’. 
On the luminal side, the EC loop and Chl b605 (bound to Val119 in this 
region) of LHCII associate with the Gly137-Pro141 segment of the EC 
loop of CP29’. The interactions between LHCII and CP29’ are further 
strengthened by the lipid-like molecules found in the interfacial gaps 
(Extended Data Fig. 8b). 
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For monomer B of the LHCH trimer, the region between Ser160 
and Pro163 in the AC loop binds to the Prol72-Gly174 region in the 
AC loop of CP26 (Fig. 4e). Neoxanthin from LHCII monomer B is in 
contact with Chl a612 of CP26. On the luminal side, Ser117 from the 
EC loop of LHCH forms a hydrogen bond with the backbone amide 
of Ala103 from the BE loop of CP26. Moreover, Chl b605 makes van 
der Waals contacts with Phe98 and Phe101 from Helix B of CP26. 
Evidently, the AC loop, neoxanthin molecule and EC loop (and the 
associated Chl b605) are important components for LHCII to bind and 
recognize CP26 and CP29’ within the supercomplex. 


Insights into energy transfer pathways 

Through the interfacial chlorophyll pairs located between the periph- 
eral and core antenna complexes, excitation energy can be transferred 
from LHCII, CP26 and CP29 to CP43 or CP47. The LHCII trimer 
contains three Chl b-rich clusters located at its monomer-monomer 
interface!?, and two of these clusters are connected to CP26 and CP29’ 
(Fig. 5a). As Chl b has a higher energy level than Chl a, the energy transfer 
between LHCII and CP26 (or between LHCII and CP29’) may flow 
from the Chl b-rich regions of LHCII to adjacent Chl a-rich regions 
in CP26 (or CP29”). For energy transfer between LHCII and CP29’, 
Chl 6605 monomera/Lucu and Chl 4604/b606¢p29' from the luminal layer 
form the closest inter-complex pair, with a Mg-to-Mg distance (Deentre) 
of 17.7/18.6 A, while Chl 6608 monomerA/LHCI and Chl b608c¢p29" at the 
stromal layer are connected with Deentre at 23.0 A (Fig. 5b). Thus, the 
Chl b-rich region of CP29’ joins with that of LHCII, facilitating energy 
transfer between two adjacent monomers of the dimeric supercomplex, 
presumably through Chl 0605 nonomera/LHcu to Chl a604/b606c¢p29 and 
b608monomerA/LHCu to b608¢cp29' pathways. For energy transfer between 
LHCII and CP26, Chl b608 from monomer B of the LHCII trimer is 
connected to Chl a612 and Chl 4610 from CP26 with Deentre at 21.8 and 
21.9A, respectively, while Chl b605 from the luminal layer of LHCII 
may transfer its excitation energy to Chl a604 of CP26 at 19.3 A Deentre 
(Fig. 5c). 

The lowest energy-state chlorophylls in LHCII were attributed to the 
Chl a610, a612 and a611 cluster and are known as the terminal emit- 
ter domain®”*. The excitation energy equilibrated within the LHCII 
trimer will be focused on this cluster*®. In the supercomplex, the 
terminal emitter from monomer A of the LHCII trimer may trans- 
fer its energy to Chl a506¢p43, which is located in a favourable ori- 
entation (nearly parallel) and distance (17.1 A Deentre) With respect to 
Chl 4611 monomera/Lucu (Fig. 5d). Below Chl a611, Chl a614 from the 
luminal layer of LHCII is connected to Chl a501¢pa3 at 25.1 ‘A Deontie: 
These two pathways form the bases of energy transfer between LHCII 
and the core complex. In the absence of minor antenna complexes, 
LHCII can transfer energy directly to the core complexes*', but the 
functional connection between LHCII and the PSII core is severely 
impaired in minor-antenna knockout mutant plants”. 

For energy transfer between CP29 and CP47, Chl a616cp29 is sand- 
wiched between Chl a609¢p29 and Chl a616cpa7 at 9.3 and 14.4A 
D centres respectively (Fig. 5e). The closest edge-to-edge distance (Dedge) 
between Chl a616c¢p29 and Chl a609¢p20/a616cpa7 is 3.4/4.2 A, indi- 
cating that these chlorophylls form strongly coupled pairs. The inter- 
stitial position of Chl a616¢p29 makes it a crucial linker, relaying the 
transfer of excitation energy from CP29 to CP47. In addition, Chl 2603 
from CP29 is directly connected to Chl a610cpa7 at 18.6 A Dicnties 
At the luminal layer, energy may be transferred from Chl b607¢p29 to 
Chl a607¢p47 at 19.1A Deentre- Alternatively, Chl b607¢p29 may transfer 
its energy to Chl a603-a609¢p29, and the energy may be further relayed 
by Chl a616¢p29 to Chl a616¢p47. Among these potential pathways, 
Chl a616cp29 to Chl a616¢p47 is likely to be the most efficient energy 
transfer pathway between CP29 and CP47, as these two chlorophylls 
are the most closely paired at the interface. 

CP26 interacts closely with CP43 and energy transfer between them 
may occur through multiple potential pathways (Fig. 5f). Chl a611¢p2.6 
forms a strongly coupled pair with the red-most Chl a612 (ref. 23) and 
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Figure 5 | Energy transfer pathways from antenna complexes to reaction 
centres. a, Distribution pattern of chlorophylls within the PSII-LHCII 
supercomplex. Chl a and Chl b are coloured green and blue, respectively. 
Arrows indicate potential energy transfer pathways between LHCII and 
CP29'/CP26 (red), LHCII/CP29’/CP26 and core antennae (CP43/CP47’) 
(magenta), and CP43/CP47’ and the reaction centre in D1/D2’ (cyan). 

The five red ovals indicate the potential energy-quenching sites around the 
Chl a611-Chl a612 pairs, and the black oval indicates the other potential 
quenching site around the Chl a603-Chl 4609 pair in CP29’. These 


this pair is probably the terminal emitter in CP26. Chl a611 is con- 
nected to Chl a512¢p43 at 18.2A Deentre and to Chl a513cpa3 at 17.2A 
Deentre. Meanwhile, Chl 6601 cp25 is coupled to Chl 4513¢p43 at 4.8 A 
Deage (with Deaieat 126 A) and is also connected to Chl a512cp43 
at 19.2 A Deentre. At the luminal layer, the excitation energy from 
Chl a614¢p2¢ may be absorbed primarily by Chl 4503 ¢p43 at 16.0 A 
Deentre- Thus, the energy transmitted from CP26 will be received by 
the Chl a513-Chl a512 pair at the stromal layer, or by Chl 4503 at the 
luminal layer of CP43. 

When the excitation energy from the peripheral antenna complexes 
has been collected by the core antenna complexes, subsequent energy 
transfer from CP47 or CP43 to the P680 special pair occurs through 
the Chl a network located within CP43, CP47, D1 and D2, as indicated 
in Fig. 5a. Under high-light conditions, clusters of pigment molecules 
within the major and minor LHCIIs may serve as non-photochemi- 
cal quenching sites that dissipate harmful excess energy as heat’. 
The potential quenching sites within the supercomplex are mainly 
located at or near the interfaces between adjacent antenna complexes 
(Fig. 5a). These locations are ideal for them to intercept and dissipate 
excess energy before it reaches the reaction centre. Recently, biophysical 
modelling studies have yielded preliminary information about the 
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quenching sites might be activated under high-light conditions so as to 
dissipate harmful excess energy. The dashed lines indicate the approximate 
boundaries of each individual complex. b-d, The interfacial chlorophylls 
supporting energy transfer between LHCII and CP29’ (b), LHCII and 
CP26 (c), and LHCH and CP43 (d). e, f, The chlorophylls at the interfaces 
between CP29 and CP47 (e), and CP26 and CP43 (f). The numbers 

near the dashed lines indicate the Mg-to-Mg distances (A) between two 
adjacent chlorophylls. The interfacial chlorophylls are highlighted in 
magenta. 


kinetics of light harvesting in PSII-LHCII supercomplexes***”. Now, 
the cryo-EM structure of the spinach PSH-LHCII supercomplex pro- 
vides a detailed framework of its highly sophisticated pigment network 
and enables a deeper understanding of the kinetics and regulation of 
light-harvesting processes within the supercomplex. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size, the experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Purification and characterization of spinach PSII-LHCII supercomplex. Grana 
membranes were prepared from spinach leaves as described previously'®. For puri- 
fication of PSII-LHCII supercomplexes, 500 1g (in chlorophyll) of spinach grana 
membrane was washed once with 1 mM EDTA, 10mM HEPES (pH 7.5) and then 
centrifuged at 16,000g for 10 min. Subsequently, the pellets were suspended in 
10mM HEPES (pH 7.5) and then solubilized at 0.5 mgm]! Chl by adding an equal 
volume of 0.6% dodecyl-c-p-maltoside (-DDM) in 10mM HEPES (pH 7.5) and 
vortexing for 1 min. The solubilized sample was centrifuged at 16,000g for 10 min 
to remove pellets and the supernatant was fractionated through sucrose-density 
gradient ultracentrifugation at 247,600g, 4°C for 21h (Beckman SW41 rotor). Two 
major bands (B8 and B9) of spinach PSII-LHCII supercomplexes, corresponding 
to the Arabidopsis B8 and B9 bands", were obtained. For electron microscopy 
studies, the B9 band with a Chl a/b ratio of around 2.9-3.1 was collected and 
concentrated to 3mg ml! (in chlorophyll) by using a 100-kDa cutoff concentrator 
during centrifugation. 

The B9 sample was characterized through absorption and fluorescence 
spectra measurements, SDS-polyacrylamide gel electrophoresis (PAGE), high- 
performance liquid chromatography (HPLC) and oxygen-evolving activity assay. 
The absorption spectra were measured at room temperature with a Hitachi U3010 
spectrophotometer. The fluorescence spectra were recorded with a Hitachi F7000 
fluorescence spectrophotometer at room temperature under 436, 473 and 500nm 
excitations. The absorption and fluorescence spectra indicate that a significant 
amount of Chl b (exclusively in the peripheral antenna domain) is present in the 
sample (Extended Data Fig. 1c, d). For protein composition analysis, 10-18% 
gradient Tris-Tricine SDS-PAGE was performed in a vertical protein gel elec- 
trophoresis system (Protean II, Bio-Rad) according to the protocol described 
in an earlier report**. Four major core subunits (PsbA, B, C and D), three large 
extrinsic proteins (PsbO, P and Q), Lhcb1/2/4/5 and several small subunits were 
detected by gel electrophoresis in the denatured B9 sample (Extended Data Fig. 1b). 
Pigment composition was analysed through HPLC as previously described” with 
slight modifications. The B9 sample was treated with 80% (v/v) acetone to extract 
pigments from the supercomplexes and then injected into a C-18 reversed-phase 
column (Alltech Allsphere ODS-2) in a Hitachi L2130 separation module equipped 
with a Hitachi L2450 diode array detector. Individual pigments were identified by 
the absorption spectrum of each elution peak. Pigment analysis indicated that the 
sample contained Chl a, Chl b, 8-carotene, lutein, neoxanthin and violaxanthin 
(Extended Data Fig. le). The oxygen-evolving activity assay was performed with 
Chlorolab-2 oxygen electrode system. Grana membranes and B9 samples were 
diluted into 10j1gml" (in chlorophyll) in 2 M betaine, 10mM NaHCOs, 10mM 
NaCl, 25mM CaCh, 25mM MES-NaOH (pH 6.5), 0.01% «-DDM. O) production 
was measured at 25°C using 3,773 |.mol photons per m? per s white light. The 
assay was supplied with 0.5 mM 2,5-dichloro-p-benzoquinone (DCBQ) as electron 
acceptors. The supercomplex sample exhibited oxygen-evolving activity at 
75 +3 smol O3 per mg (Chl) per h, comparable to a similar sample prepared from 
Arabidopsis previously"®. 

Electron microscopy. Approximately 3.0-1l aliquots of 3 mg ml! PSII-LHCII 
supercomplex sample (B9) were applied to glow-discharged GIG holey carbon 
grids (1.0m hole size, 400 mesh). The grid was flash-frozen in liquid ethane 
at around 100 K using a semi-automatic plunge device (FEI vitrobot IV) with a 
blotting time of 3s and blotting force of level 2 at 100% humidity, 16°C. Sample 
screening was performed on Talos F200C 200-kV electron microscope equipped 
with a 4K x 4K Ceta camera (FEI). The images used for structure determination 
were collected on a direct electron device (FEI Falcon III) using integrating mode 
in a 300-kV FEI Titan Krios electron microscope. A total of 1,774 micrographs 
were recorded at a calibrated magnification of 103,704 yielding a pixel size of 1.35A 
on the detector (detector pixel size: 141m), with a dose rate of approximately 
25e A~*s~! and a defocus range between 0.8 and 2.0,1m. Each exposure of 2s 
was dose-fractionated into 32 movie frames. 

Data processing, classification and reconstruction. A small data set was collected 
onan FEI Talos electron microscope with a Ceta camera. Reference-free 2D classi- 
fication produced several distinguished classes in which some of the class-averaged 
images could be recognized as side views of the PSII-LHCII supercomplex!’™"®. 
Assuming that the two side views with the longest and shortest dimensions were 
perpendicular to each other, an initial model of the complex was made by using 
these two side views. The initial model, low-pass filtered to 60 A, was refined with 
the whole data set. The refinement yielded an 18 A-resolution map that was sim- 
ilar to the previous result!>. This reconstruction map was rescaled and used as an 
initial model for the refinement with the high-resolution data set collected on the 
Titan Krios. 
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For the data set collected on the Titan Krios, the beam-induced motion of the 

whole micrograph with 32 movie frames was corrected by MOTIONCORR™. After 
alignment, an averaged image of 32 frames was used to determine the defocus 
value and the parameters of astigmatism by program CTFFIND3”". A subset of 
around 4,000 particles from about 50 micrographs was first semi-automatically 
picked using the program e2boxer.py (ref. 52). The images of the subset were 
classified using reference-free 2D classification and eight of the class-averaged 
images were selected as templates for an automatic particle-picking procedure in 
program RELION*® to process the whole data set. The 223,927 particles picked by 
the program were manually screened to remove those from images with overlapped 
particles and other bad particles. A total of 192,071 particles were kept for further 
data processing. These remaining particle images were 2D classified and 182,586 
images in good classes were subjected to 3D classification without imposing any 
symmetry. The class of C,S)-type supercomplexes had 143,003 particle images, 
while the other class of C,S-type supercomplexes had 39,583 images (excluded 
from further refinement). After another round of 3D classification, 109,042 images 
of the C,S) supercomplex were kept for 3D refinement with two-fold symmetry 
imposed and used to produce a 3D reconstruction map with a nominal resolution 
of 3.5 A. The resolution of the reconstruction was further improved to 3.2 A by 
local motion correction and particle polishing processes. 
Model building and refinement. For model building, crystal structures of 
Thermosynechococcus vulcanus PSII (TvPSI, PDB codes: 3WU2 and 4IL6), Spinach 
LHCII (PDB code: 1RWT), CP29 (PDB code: 3PL9), PsbP (PDB code: 4RTI) and 
PsbQ (PDB code: 1VYK) were first manually fitted into the 3.2 A cryo-EM map in 
UCSF Chimera™ or COOT®®, and then manually adjusted in COOT. The densities 
of PsbP and PsbQ in the cryo-EM map are weaker (but clearly distinguishable) than 
the membrane-intrinsic core subunits (D1, D2, CP43 and CP47), indicating that 
they may have relatively low occupancy or high mobility. The amino acid sequences 
of the TvPSII structural model were mutated to its counterparts in Spinacia 
oleracea. The PsbU, PsbV and PsbYcf12 subunits, which are present in TvPSII but 
absent from higher plants, were deleted during model building. The N-terminal 
region of CP29 was built manually and based on the well-defined continuous 
electron density of its main chain, and the sequence was registered according to 
the bulky side-chain densities in this region. Similarly, de novo model building 
was performed on PsbW and PsbTn. The atomic model of CP26 was mutated 
from a LHCII monomer and refined according to the cryo-EM map. Among the 
15 known low-molecular-mass constituent subunits (14 intrinsic subunits and 
1 extrinsic subunit) of plant PSII, 13 have been located in our cryo-EM map of the 
PSII-LHCII supercomplex. The two unidentified subunits are PsbR and PsbY. PsbR 
is a 10-kDa protein that is involved in binding PsbP” and is essential for optimal 
oxygen-evolving activity of PSII (ref. 56). PsbY was located near PsbE and PsbF in 
the Sr-substituted TvPSII (ref. 57). In the spinach PSII-LHCII supercomplex, no 
strong protein density corresponding to PsbR or PsbY was observed, indicating 
that they might be lost during purification. PsbS is essential for photoprotection 
through non-photochemical quenching**”. Although the B9 sample used for the 
cryo-EM study contained PsbS protein, it was not observed in the electron density 
map of the supercomplex, probably owing to its nonspecific association with the 
supercomplex, as explained previously”®. 

The structure model of the spinach PSII-LHCII supercomplex was first refined 
in real space against the cryo-EM map by Phenix 1.9 (ref. 60) with geometry 
and secondary structure restraints. During real space refinement, the distances 
between the central magnesium ions of chlorophyll molecules and the coordinating 
ligands were restrained according to the values obtained from the high-resolution 
crystal structures. In addition, refinement in reciprocal space was performed in 
REFMAC®) with stereo-chemical and homology-derived restraints using mod- 
ified scripts of the program adapted for the cryo-EM map. Automatic real-space 
and reciprocal-space refinements followed by manual correction in COOT were 
carried out iteratively until there were no more improvements in both R factor and 
geometry parameters. The statistics for data collection and structure refinement 
are summarized in Extended Data Fig. 2c. 
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Extended Data Figure 1 | Purification and characterization of the 
spinach PSII-LHCII supercomplex. a, Sucrose gradient of solubilized 
grana membranes. The membrane preparations were first washed with 
1mM (left) or 5mM (right) EDTA before being solubilized by a-DDM 
for further purification through sucrose-gradient ultracentrifugation. 

The content of each band is indicated based on the absorption spectrum 
and SDS-PAGE results, and by comparing to previously published data. 
The B9 fraction obtained from the grana membrane washed with 1 mM 
EDTA was used for cryo-EM. Note that the grana membranes washed 
with 1mM EDTA yielded less B5, B6 and B7 than the sample treated with 
5 mM EDTA. b, SDS-PAGE analysis of the sucrose gradient fractions. 
The protein composition of each Coomassie band was indicated based on 
the mass spectrometry and proteomics data analysis. For gel source data, 
see Supplementary Fig. 1. c, Room-temperature absorption spectrum of B9 
sample used for cryo-EM. Its spectrum (B9 1 mM) is compared to those of 


B7 (dimeric PSII core without LHCII attached; B7 5 mM) and B9 samples 
(B9 5 mM) fractionated from grana washed with 5mM EDTA. Note that 
B9 from grana membranes washed with 1 mM EDTA showed higher peaks 
at 470 and 650 nm, indicating that this fraction contains higher Chl b 
content (from LHCHUs) than the other two. The spectra are normalized to 
the maximum in the red region. d, Fluorescence emission spectra of B9 
sample measured at room temperature. The maximum emissions were at 
681 nm (upon excitation of Chl a at 436 nm), 680 nm (upon excitation of 
Chl b at 473 nm) and 681 nm (upon excitation of carotenoids at 500 nm). 
Overlapping of these three spectra suggests that nearly all pigments in the 
B9 sample are well coupled and no free pigments are present. e, Pigment 
content analysis of B9 sample by HPLC. Based on the characteristic 
absorption spectrum of each peak fraction, the six major pigment 

peaks separated from the B9 sample are identified as neoxanthin (Neo), 
violaxanthin (Vio), lutein (Lut), Chl b, Chl a and 6-carotene (3-car). 
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Extended Data Figure 2 | Evaluation of the resolution of the cryo-EM resolutions of the cryo-EM map of the spinach PSH-LHCII supercomplex 
structure of the spinach PSII-LHCII supercomplex. a, Fourier shell estimated by Resmap. Top, side view along the membrane plane with the 
correlation (FSC) plots. Blue, gold-standard FSC curve with a value of luminal domain facing upwards. Bottom, bottom view from the luminal 
0.143 at 3.2 A resolution; red, FSC curve calculated between the cryo-EM side and approximately along the membrane normal (or C2 axis). c, The 
map and the refined structure model of the PSII-LHCI supercomplex. statistics of the structural model of the spinach PSH-LHCII supercomplex 
The map-model FSC has a value of 0.5 at 3.3 A resolution. b, Local refined against the 3.2 A resolution cryo-EM map. 
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Extended Data Figure 3 | Structures of the large and small intrinsic 
subunits of spinach PSII. a, The four large intrinsic subunits of the 
spinach PSII core superposed on the corresponding subunits of the TvPSII 
core. The protein backbones and cofactors are shown as ribbon and stick 
models, respectively. Silver, spinach PSII core subunits; green, TvPSII core 


PsbTc 


PsbX 


subunits. b, The locations of 12 low-molecular-mass intrinsic subunits in 
the spinach PSII-LHCII supercomplex. These subunits are coloured and the 
rest of the supercomplex is grey. c, The densities for the low-molecular-mass 
intrinsic subunits are shown as blue meshes. The corresponding models 
are shown as cyan sticks. 
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Extended Data Figure 4 | Cryo-EM densities and structures of the indicate the loop regions between Lys90 and Ala111 and between Arg134 
extrinsic subunits. a, Cryo-EM densities of PsbO, PsbP, PsbQ and PsbTn. and Gly142, respectively. Note the conformational change in Loop 3A 

b, The binding sites of spinach PsbP, PsbQ and PsbO compared to those (arrow) when PsbP binds to the PSII core. d, Structure of PsbQ bound 

of the extrinsic subunits in CcPSII and TvPSII. The spinach PSII core is in the supercomplex superposed with the isolated PsbQ. Green, PsbQ 
shown at an angle identical to that of the CcPSII/TvPSII core. PDB codes: in the supercomplex; red, isolated PsbQ (PDB code: 1VYK). Note the 
4YUU (CcPSID); 3WU2 (TvPSII). c, Superposition of PsbP bound in the conformational change in the elongated N-terminal region from a folded 
supercomplex with the isolated PsbP. Colour code: yellow, PsbP in the state to an extended form (arrow) when PsbQ binds to the PSII core. 


supercomplex; blue, isolated PsbP (PDB code: 4RTI). Loop 3A and Loop 4A 
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COOH COOH 
Extended Data Figure 5 | Cryo-EM density and structure of LHCII. the previous crystal structure (PDB code: IRWT). The protein backbone is 
a, Cryo-EM densities of the LHCII trimer in the supercomplex. Stereo shown as ribbon diagrams and the cofactors are displayed as stick models. 
pairs are shown and the view is along the membrane plane. Green, cryo-EM structure; yellow, crystal structure. 


b, Superposition of the cryo-EM structure of an LHCII monomer with 
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Extended Data Figure 6 | Cryo-EM density and structure of CP29. 

a, Stereo image of the cryo-EM density of CP29 bound in the PSII-LHCII 
supercomplex. b, Superposition of cryo-EM structure of full-length CP29 
with the previous crystal structure. Note: Chls a601 and a616 are newly 
observed in the cryo-EM structure of CP29. Chl a601 might account 

for the electron density of Chl a615 observed in the crystal structure of 
spinach CP29 (ref. 14). Compared to Chl 2601, a615 is much closer to 
a611 owing to the loss of the N-terminal domain caused by proteolysis. 
Chl b614 is a peripheral chlorophyll found in the crystal structure, but 

is probably lost during purification and therefore not observed in the 


Long hairpin -. 


Motif Il 


a616 (CP29) 


a617 (Lhca3) 


cryo-EM structure. Orange, cryo-EM structure; cyan, crystal structure. 

c, Superposition of CP29 (orange) with the structure of an LHCII monomer 
(green). For the cofactors, only Chl 2601 is shown; the others are omitted 
for clarity. d, Cryo-EM density of Chl a601 in CP29. e, Cryo-EM density of 
Chl a616 at the interface between CP29 and CP47. f, Superposition of Lhca3 
and Lhca4 structures with that of CP29 in the PSII-LHCII supercomplex. 
Chl a616 (CP29) and a617 (Lhca3/4) molecules are shown as stick models; 
the other cofactors are omitted for clarity. Orange, CP29; magenta, Lhca3; 
blue, Lhca4. PDB codes: 4XK8, Lhca3 and Lhca4 from the PSI-LHCI 
supercomplex; 1RWT, LHCIL 
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Extended Data Figure 7 | Cryo-EM density and structure of CP26 
bound in the PSII-LHCII supercomplex. a, Stereo images of the density 
and overall structure of CP26. The density is shown as grey meshes and 
the model is in purple. The protein backbone is shown as a ribbon model; 
the cofactors are presented as stick models. b, The densities for Chl b601, 
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Chl a604, Chl b607 and Chl b608 in CP26. These four chlorophylls were 
not predicted in the previous work, but are clearly present in the structure. 
c, The density for three carotenoids in CP26. Note that the density for the 
epoxidized head group of neoxanthin is clearly visible, while the rest of it is 
fairly weak (presumably owing to low occupancy or high flexibility). 
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Extended Data Figure 8 | Cryo-EM densities of various cofactors and LHCII and CP29 are shown from left to right. Red arrows indicate the 
bound in the spinach PSII-LHCII supercomplex. a, The densities of positions of potential lipid densities. The cryo-EM densities are displayed 


chlorophylls, carotenoids, MnyCaO; and plastoquinone molecules. b, The as grey meshes and the atomic models for interpretation of the densities 
potential lipid densities at the interfacial regions between adjacent antenna _are shown as sticks and bullets. 
complexes. The interfaces between LHCH and PsbW, CP26 and CP43, 
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Extended Data Table 1 | Cofactors located within each monomer of the spinach PSII-LHCII supercomplex 


Protein 


D1 (PsbA) 


D2 (PsbD) 


CP47 (PsbB) 


CP43 (PsbC) 


PsbH 

PsbK 

PsbL 

PsbZ 

Cyt 6559 (PsbE & 
F) 


LHCII trimer 


CP29 


CP26 


Total 


chlorophyll 


4 Chl aand2 


Pheophytin 


2Chla 


16 Chla 


13 Chla 


42 (24 Chl a and 18 
Chl b) 
13 (10 Chl a and 3 


Chl b) 


13 (9 Chl a and 4 Chl 


b) 


105 


carotenoid 


1 BCR 


1 BCR 


3 BCR 


3 BCR 


1 BCR 


1 BCR 


12 (6 Lut, 3 Vio, 
3 Neo) 

3(1 Lut, 1 Vio, 1 
Neo) 


3 (2 Lut, 1 Neo) 


28 


haem 


lipid others 

2 SQDG 1 Mn,CaOs cluster, <1 
plastoquinone (weak 

1 MGDG 
density due to partial 
occupancy on Qg site) 

3 PG 1 plastoquinone 

1 MGDG (strong density on Qa 
site) 

1 SQDG 

1 MGDG 

3 DGDG 

1 MGDG 

1 DGDG 

1PG 

1 MGDG 

3PG 

1PG 

1PG 

21 2-3 


ARTICLE 


BCR, 8-carotene; DGDG, digalactosyldiacy! glycerol; Lut, lutein; MGDG, monogalactosyldiacy! glycerol; Neo, neoxanthin; PG, phosphatidyl glycerol; SQDG, sulfoquinovosyldiacy| glycerol; Vio, violaxanthin. 


© 2016 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Extended Data Table 2 | Pigment binding sites of spinach LHCII, CP29 and CP26 in the PSII-LHCII supercomplex 


Peripheral antenna LHCII CP29 CP26 

complexes 

Chlorophylls 

601 Chl b (Tyr24) Chl a (Trp14)" Chl b (Phe34)* 

602 Chi a (Glu65) Chl a (Glu96) Chl a (Glu78) 

603 Chl a (His68) Chl a (His99) Chl a (His81) 

604 Chl a (H20) Chl a (H20) Chl a (putative H20) * 

605 Chl b (Val119)[GIn122, Ser123]° é 2 

606 Chl b (H20) Chl b (H20) Chl b (putative H20) 

[H20] [putative H2O] [putative HO] 

607 Chl b (H20)[GIn131] Chl b (H20)[Glu154] Chl b (putative HO) 
[Glu142]* 

608 Chl b (H2O)[Leu148] Chl b (H20)[GIn161] Chl b (H20)* 

609 Chl b (Glu139)[Gin131] Chl a (Glu159) Chl a (may also accept Chl 
b) (Glu150) 

610 Chl a (Glu180) Chl a (may also accept Chl b) Chi a (Glu189) 

(Glu197) 

611 Chl a (PG) Chl a (PG) Chl a (PG) 

612 Chi a (Asn183) Chl a (His200) Chl a (Asn192) 

613 Chl a (GIn197) Chl a (GIn214) Chl a (GIn206) 

614 Chl a (His212) ll Chl a (His221) 

615 ll : 

616 Chl a (Leus0)* - 

Chl a/b ratio (structural 1.33 3.3 (or 2.5 if Chl 6614 is 2.3 (may be lower if 609 is 

model) present) an Chl a/b mixed site) 

Chl a/b ratio (biochemical 1.3-1.4 2.5-3.0 2.1-3.3 

analyses)" 

Carotenoids 

1 lutein lutein lutein 

L2 lutein violaxanthin lutein 
(may also accept 
violaxanthin) 

N1 neoxanthin neoxanthin neoxanthin 

v1 violaxanthin - - 


The local resolution of our cryo-EM map has an uneven distribution, as shown in Extended Data Fig. 2b. The core region has a relatively higher resolution (at 3.0-3.5 A) than in the regions of peripheral 
antenna system (3.2-4.0 A), sufficient to identify the number of pigment molecules bound to each antenna complexes and locate their positions. The identities of chlorophylls (Chl a or Chl b) and 
carotenoids (lutein, neoxanthin or violaxanthin) are assigned mainly by referring to the information obtained from previous work on the high-resolution crystal structures of spinach LHCII!? and CP29 
(ref. 14) and the functional architecture of CP26 (ref. 23). 

*Central ligands of chlorophylls coordinating the Mg atoms are shown in parentheses. 

As the N-terminal region of CP29 is intact in the cryo-EM structure, its 601 site is occupied by a chlorophyll (tentatively assigned as Chl a) coordinated by Trp14 (corresponding to Tyr24 in LHCII). 
Owing to proteolysis at the N-terminal region, the chlorophyll at the 601 site in the previous crystal structure of CP29 might have shifted to the nearby 615 site, sharing the same ligand with Chl a611. 
Newly identified chlorophyll-binding site in CP29 or CP26. 

§The hydrogen bond donors of the C7-formy! group of Chl b molecules are shown in square brackets. 

|These sites in the previous crystal structure of CP29 were occupied by a Chl b (614) and Chl a (615) coordinated by His229 (614) and glycerol-3-phosphate, respectively. They are not observed in the 
cryo-EM structure reported here. Chl b614 is located at a peripheral site in contact with detergent and might be lost during purification, leading to a vacant site without chlorophyll bound. 

{These data were extracted and summarized from previous publications!+3-3-66, 
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Regulation of black-hole accretion by a disk wind 
during a violent outburst of V404 Cygni 


T. Mufioz-Darias!?, J. Casares+2?, D. Mata Sanchez), R. P. Fender®, M. Armas Padilla2:4, M. Linares!2">, G. Ponti®, 


P. A. Charles*’, K. P. Mooley? & J. Rodriguez® 


Accretion of matter onto black holes is universally associated with 
strong radiative feedback! and powerful outflows?. In particular, 
black-hole transients* have outflows whose properties‘ are strongly 
coupled to those of the accretion flow. This includes X-ray winds 
of ionized material, expelled from the accretion disk encircling the 
black hole, and collimated radio jets®®. Very recently, a distinct 
optical variability pattern has been reported in the transient 
stellar-mass black hole V404 Cygni, and interpreted as disrupted 
mass flow into the inner regions of its large accretion disk’. Here 
we report observations of a sustained outer accretion disk wind 
in V404 Cyg, which is unlike any seen hitherto. We find that the 
outflowing wind is neutral, has a large covering factor, expands 
at one per cent of the speed of light and triggers a nebular phase 
once accretion drops sharply and the ejecta become optically thin. 
The large expelled mass (>10~° solar masses) indicates that the 
outburst was prematurely ended when a sizeable fraction of the 
outer disk was depleted by the wind, detaching the inner regions 
from the rest of the disk. The luminous, but brief, accretion phases 
shown by transients with large accretion disks” imply that this 
outflow is probably a fundamental ingredient in regulating mass 
accretion onto black holes. 

The X-ray binary V404 Cyg (GS 2023+338) is a confirmed stellar- 
mass black hole® with a precisely determined distance from Earth of 
2.4kpc (ref. 9). After 25 years of quiescence, NASA’s Swift mission 
detected renewed activity on 15 June 2015", initiating a two-week 
period of intensely violently variable emission across all wave- 
lengths!" !*. Our high signal-to-noise optical spectra covering the 
entire X-ray/radio-active phase (~15 days) show that, contemporane- 
ously with radio jet emission, continuous ejections of neutral material 
at ~0.01c, where c is the speed of light in a vacuum, are present from 
low-level accretion phases (<1% of the Eddington luminosity Lega) 
to the X-ray peak (Methods; Fig. 1, Extended Data Fig. 1). These are 
observed in hydrogen (Balmer) and helium (He 1) emission lines as 
deep P Cyg profiles throughout the outburst! and extremely broad 
wings once the X-ray and radio fluxes decay. P Cyg profiles result 
from resonant scattering in an expanding outflow with a spherical 
geometry or at least sustaining a large solid angle'*' (Methods). Of 
a dozen transitions showing this feature, the deepest are seen in the 
He 1, \=5,876A emission line, which is used as a reference for this 
study (see Extended Data Fig. 2). 

The strongest P Cyg profiles are witnessed during days 1 to 6 (Fig. 1 
and Fig. 2 for the evolution of the profiles during day 2; see Methods), 
when the X-ray luminosity is typically 1,000 times fainter than the 
~Lraa flares displayed later in the outburst”! (Extended Data Fig. 1). 
Blue-shifted absorptions are as deep as 30% below the continuum 
level and we measure terminal velocities in the range Vr = 1,500- 
3,000 km s~' (Figs 1 and 2, Extended Data Figs 2 and 3). Symmetric 


red-shifted (that is, positive velocity) outflow emission, completely 
detached from the accretion disk line component, is sometimes 
evident (see Fig. 2 from minute 60 onwards). 

To trace the ionization state of the outer disk, we computed the line 
flux ratio I,atio = He 1 (A= 4,686 A)/HB and obtain Ipatio < 0.5 when 
P Cyg absorptions are deepest (Extended Data Fig. 1; Methods). 
Through the outburst, both the X-ray and optical emission are char- 
acterized by the presence of short and long flaring activity”’’. During 
these flaring episodes, I;atio increases while the P Cyg profiles become 
weaker, subsequently recovering their pre-flare strength when the 
X-ray flux and Lpatio drop (Fig. 2). This indicates that the detection of 
P Cyg absorptions is driven by ionization effects. Indeed, on days 7 to 
10 much shallower absorptions (only ~2% below the continuum level) 
are witnessed as the system enters the brightest phase of the outburst 
and I;atio becomes always larger than unity (Fig. 1 and Extended Data 
Fig. 1). Furthermore, we note that the Ha profile is very asymmetric 
during the whole outburst, providing a further indication of the ubiq- 
uitous presence of wind outflows during our observations (Extended 
Data Fig. 4; Methods). 

The low temperature T that is required to have both neutral hydro- 
gen (T <10*K) and helium (T <3 x 10*K) places the wind-launching 
radius Rj at the outer accretion disk regardless of the wind-launching 
mechanism. On the other hand, the low luminosity associated with 
the deepest P Cyg profiles rules out radiation-pressure winds driven 
by Thomson scattering. The thermal wind scenario!®, in which Vy 
roughly corresponds to the escape velocity at Rj, is able to reproduce 
our observations. Using Vr = 1,500-3,000 km s~!, we obtain Rj= 
(1.5-6) x 10°km, which corresponds to disk temperatures in the range 
~5,000-30,000 K for luminosities within the range 0.001 Lgqq—-0.1 Leda 
(Methods). A crude estimation of the mass-loss rate associated with 
the most conspicuous profiles (day 2) suggests Mout > 107 8Moyr! 
(Methods). This lower limit accounts only for neutral matter outflows, 
but a wind of ionized material could also be launched. A hot wind with 
Vy up to ~4,000 km s~! is indeed detected by the only Chandra point- 
ing performed during the outburst’’. 

The second signature of the high-velocity wind is a short nebular 
phase witnessed at the end of the outburst. Following a sharp drop by 
a factor of ~1,000 in the X-ray, optical and radio luminosity from the 
major flares that end the brightest phase of the outburst on days 9 and 
10, the Balmer lines became unprecedentedly intense for a black hole, 
showing equivalent widths up to ~2,000 A (Ha; Extended Data Fig. 1). 
They sit on extended wings reaching similar velocities to the Vr 
observed in the P Cyg profiles (+3,000 km s~'; inset in Fig. 3). A forest 
of broad emission lines, such as Si 11 and Fe 11, also appears (Fig. 3), 
while the Balmer decrement (BD; the ratio between the Balmer 
line fluxes; see Methods) increases up to ~6, as compared to ~2.5 
observed earlier (Extended Data Figs 1 and 5). High values of BD 
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Figure 1 | P Cyg profiles observed during days 2, 6, 7-9 and 10 in 

Her A=5,876 A. Normalized spectra are offset by 0, 0.6, 1.2 and 1.8, 
respectively. Profiles are formed when atomic material approaching the 
observer at velocity — Vout (with Vout the projected outflow velocity) scatters 
photons with frequency v= (1 —Vou/c), while receding ejecta moving 

at velocity Vou are being illuminated by the central source. Yellow shading 
indicates regions contaminated by interstellar absorption. We detect 
approaching material moving at up to 3,000kms! (blue-filled absorptions). 
During days 7-9 and day 10 the profiles are very shallow, corresponding 

to high ionization states (see text). Simultaneously with the blue-shifted 
absorption we detect red-shifted emission detached from the accretion disk 
component (see also Fig. 2). This red-shifted emission reached amplitudes 
similar to that of emission lines produced by approaching material, a feature 
indicating spherical geometry or at least a large covering factor’. 


are associated with nebulosities, as a result of neutral hydrogen self- 
absorption in relatively low-density conditions!’ (Methods). This 
behaviour is expected when the outflow cools and expands, becoming 
optically thin. The symmetric wings indicate a large covering factor 
for the ejecta. Expanding nova shells are characterized by similar BD 
values’? during some stages, as well as exhibiting some of the emission 


lines detected here. These emission lines are also found in low- 
excitation nebulosities surrounding outflowing massive stars”°, which 
show similar Ha equivalent widths in their final and most violent 
evolutionary phases”!. This phase is not witnessed after other strong 
flares displayed early in the outburst (for example, day 4). However, 
these events are not followed by a strong drop in flux, such as occurs 
in the case of the major flares preceding the nebular phase (Extended 
Data Fig. 1). 

The timescale of the optically thick (P Cyg) to optically thin 
(extended wings) transition is the diffusion timescale of an expanding 
shell with mass Mgneu, and it is estimated”! to be tyr 23 days (1/Ris) 
(Mshet/Mo), where M, is the mass of the Sun and Rj; is the radius of 
the spherical envelope in units of 10'* cm. For taig=0.002-0.1 days, 
which would be a conservative timescale relevant to the evolution of 
the BD value, we obtain M,he © (10° °-10-°)M,. This is consistent 
with the black hole blowing away a substantial fraction of the matter 
stored in its large (mass of ~10°°M,) accretion disk”. On the other 
hand, this amount of mass is able to explain the increase in the equiv- 
alent hydrogen column density (of up to Ny 10*4cm~’) observed 
during both the 1989 and the 2015 outbursts?” (Methods). 

The active phase of the 2015 outburst of V404 Cyg is much shorter 
(~15 days) than typically observed in other luminous black holes 
(months to a year). This is followed by a sharp decay (~3 days), still 
during the radio-loud phase of the outburst, directly after the X-ray 
peak is reached. This behaviour is consistent with that observed in the 
1989 outburst””. During these brief outbursts only about 0.1% (that is, 
(0.3-1.1) x 10-8M.) of the material stored in the accretion disk is 
accreted by the black hole. This corresponds to the gas kept in the 
innermost ~(6-9) x 10°km, which is unaffected by the long-lived outer 
disk wind. The amount of mass transferred from the donor star to the 
accretion disk during the preceding 26 years of quiescence is estimated 
to be -AM) 3 x 10° 8M, (Methods). This amount is comparable to 
that accreted by the black hole and ejected by the wind. We also detect 
a prominent double-peaked Ha line right after the end of the nebular 
phase, indicating the presence of a remnant accretion disk once the 
most active phase of the outburst was finished. Strong Ha emission has 
in fact been observed throughout the inter-outburst interval, and it is 
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Figure 2 | Trailed spectrum corresponding to data from 19 June (day 2). 
The trail (bottom panel) covers 103 min with 75 spectra. Time corresponds 
to minutes from MJD 57,192.04. The normalized intensity scale is such that 
absorptions are represented in blue colours, while emissions are plotted 

in red colours. Simultaneous X-ray (Integral; blue stars), optical (green 
dots) and radio (red squares) normalized light curves are shown in the top 
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panel. Outflows are detected along the observation, but their properties 
change in response to flaring. The strongest features become evident 
directly after a sharp X-ray flare is seen (dashed line), as soon as the X-ray 
flux decreases and I;atio reaches values as low as 0.5. During the flare (at 
~0.08 times the flux peak observed later in the outburst), the P Cyg profile 
becomes weaker, as [ratio increases to values larger than unity (up to ~2). 
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Figure 3 | Spectral evolution towards the nebular phase. Average, 
normalized spectra corresponding to days 9-11. A log scale has been used 
to represent the intense Ha emission, which reached an equivalent width 
of 2,000 A. An offset of 0.1 and 0.2 has been added to the day-10 and day-11 
spectra, respectively. The optical flux drops by two orders of magnitude, 


formed at ~80% of the outer disk radius”? (0.8Rour=7 X 10°km). This 
(quiescent) period is also characterized by an X-ray luminosity about 
two orders of magnitude brighter than typically observed in quiescent 
black holes”, implying ongoing accretion from the remnant accretion 
disk. Similarly, the much fainter secondary outburst detected in 
December 2015”° also indicates the presence of an active disk only six 
months after the major outburst. This time lapse is consistent with the 
viscous timescale for refilling the inner disk (Methods). 

In contrast with the sparse optical data obtained during the 1989 
outburst”, the intense observing campaign presented here allowed 
us to study in detail the evolution of the wind outflow and to detect 
the short-lived nebular phase. A nebular phase might also have 
occurred in the 1989 outburst—where intermittent P Cyg profiles were 
detected”°—but have been missed because of the scarce monitoring. 
On the other hand, the relative proximity of V404 Cyg enables both 
detailed spectroscopic observations at luminosities as low as 10~-*Lgaa 
and the detection of outflow features as weak as 2% of the continuum 
level during the brightest phases. In addition, the large accretion disk 
implied by the 6.5-day orbital period—the majority of black holes have 
orbital periods shorter than ~2 days (ref. 27)—provides the resource 
for the formation of outer disk outflows. Besides V404 Cyg, the behav- 
iour of the other two systems with the longest orbital periods might 
also be influenced by the presence of mass outflows. V4641 Sagittarii, 
with Pop =2.8 days, has shown several brief outbursts characterized by 
strong radio emission. Extended Ha wings (2,500 km s~') have been 
reported in a low-luminosity observation, possibly with a weak P Cyg 
profile in a Fe 11 emission line?®, Likewise, GRS 1915 + 105, the black 
hole with the longest orbital period, has been permanently in outburst 
for the past 23 years, alternating lower-luminosity plateau phases with 
short luminous episodes lasting only a few weeks*. 

It is interesting to note that both V404 Cyg and GRS 19154105 
share distinctive variability patterns in their X-ray and optical emis- 
sion, regardless of their differing luminosities’. These include short- 
term variations with large amplitudes, which, in addition to the neutral 
wind outflows reported here, seem to be a common feature of long- 
period black holes. This variability pattern has been proposed to result 
from insufficient mass flow reaching the inner parts of large disks, and 


corresponding to the decay of the X-ray and radio outburst (Extended 
Data Fig. 1). He 1 and Balmer lines become intense and broad as other 
transitions become evident (Si 1, Fe 11). The inset shows the Ha region, 
where broad wings reaching +3,000 kms! become apparent. 


it might also affect the outburst evolution in addition or alternatively 
to the presence of the disk outflow presented here. The highly ionized 
wind of GRS 1915+105 during high-accretion-rate phases has been 
suggested to have a role in explaining the variability properties of the 
source, thereby linking (X-ray) outflows and oscillation patterns”’. 
Unfortunately, this system cannot be observed in the optical part of the 
spectrum owing to high interstellar extinction. Furthermore, it is not 
clear whether or not a similar coupling mechanism could be at work 
at the much lower luminosities associated with both the variability 
patterns’ and the neutral wind outflow observed in V404 Cyg. 

The sustained disk wind that we have discovered in V404 Cyg could 
be a new fundamental driver in the accretion process of the largest, and 
hence most powerful, black-hole accretion disks. The outflow probably 
regulates the evolution of the outburst by depleting a sizeable fraction 
of the outer disk, thereby detaching the innermost regions, which are 
eventually accreted. This suggests behaviour analogous to that of the 
cold and massive outflows seen in active galactic nuclei, which shape 
their host galaxies at long distances from the central black hole*”’. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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Optical spectroscopy. Observations. V404 Cyg was observed with the Optical 
System for Imaging and low-Intermediate-Resolution Integrated Spectroscopy 
(OSIRIS) located at the Nasmyth-B focus of the 10.4-m Gran Telescopio Canarias 
(GTC), La Palma (Spain). We used three different optical grisms; R1000B (2.12 A 
per pixel), R2500R (1.04 A per pixel) and R2500V (0.80 A per pixel), which, com- 
bined with a 1.0” slit, give R=611, 1,485 and 1,509, respectively. These cover the 
spectral ranges: 3,630-7,500 A, 5,575-7,685 A and 4,500-6,000 A. The complete 
observational set consists of 545 spectra obtained on 14 different nights within 
the period 17 June to 1 July (see Extended Data Table 1). The slit was rotated to 
PA=59.15° to allow for simultaneous observations of a field star. 

Data analysis. GTC spectra were bias- and flat-field-corrected using Image 
Reduction and Analysis Facility (IRAF) standard routines. The wavelength cali- 
bration was performed using Hg-Ar, Ne, and Xe lamps provided by the GTC team. 
Small velocity drifts (<20km s~') due to instrumental flexure were measured 
from the centroid of the O 1 \=5,577.340 A and = 6,300.304 A lines and used to 
correct the individual spectra. We use routines within MOLLY (http://deneb.astro. 
warwick.ac.uk/phsaap/software/molly/html/INDEX.html) and IDL (Interactive 
Data Language) to perform further analysis of the spectra. Spectra were initially 
flux-calibrated relative to the comparison field star included in the slit. The latter 
was calibrated relative to the spectrophotometric standard star Wolf 1346, using 
a low-resolution spectrum taken with a 10-arcsec slit during a photometric night 
at air mass ~1.05. Finally, we applied this flux calibration to the whole database. 
Balmer decrement. The BD was obtained by computing the flux ratio Ha to HB 
after subtracting the underlying continuum flux and correcting for reddening”! 
of Ezy = 1.3. Results are comparable to those obtained from the ratio of the peak 
intensities of both lines, and results are consistent with case B recombination 
(BD = 2.5-3) with the exception of days 11 to 15, where values as high as 6 are 
observed. This epoch is the so-called nebular phase (see Extended Data Figs 1 
and 5). We also note that values larger than ~3 are found corresponding to the 
strongest P Cyg profiles witnessed on days 1, 2 and 6. 

Ionization state. Given the dramatic flux changes observed in the X-ray and optical 
bands, strong changes in the ionization state of the disk are expected as a result 
of differing irradiation levels. A good, widely used tracker of the variable irra- 
diation of the outer disk is the He 1 A= 4,686 A to H6 line flux ratio (I;atio) aS 
it is reddening-independent. Nevertheless, as for the BD, we compute this ratio 
after subtracting the underlying continuum flux and correcting by Ez_y = 1.3. Not 
surprisingly, [patio is strongly correlated with the optical flux. Studies performed 
on accreting white dwarfs have shown some distinctive optical properties when 
Tratio is larger than unity as a result of high ionization*”. In our analysis we find 
that strong P Cyg profiles (blue absorption deeper than 5% of the continuum) are 
always associated with I;atio < 1. The deepest profiles (30% below the continuum) 
are seen at I,atio = 0.5. 

Ha profile. A visual inspection of the evolution of the Ha line profile reveals a 
systematic asymmetry. We fitted the Ha profile for each spectrum using a Gaussian 
model centred at the rest wavelength. We find that the line is redshifted during 
the first 11 days (top panel in Extended Data Fig. 4), reaching velocities above 
~100km s~! during the first 8 days (that is, just before the brightest phase of 
the outburst), corresponding to the strongest P Cyg profiles. Similarly, the V/R 
ratio (defined as the ratio of the blue to red equivalent widths) confirms the line 
asymmetry observed up to day 11 (bottom panel in Extended Data Fig. 4). Error 
bars account for variability within every observing window. We interpret this as a 
result of continuous blue absorption (and extra red emission) present in the line 
profile, with the more extreme cases leading to P Cyg profiles. This is also sensitive 
to the presence of low-velocity outflows, which partially cover more central parts 
of the blue line profile. 

X-ray observations. Integral observatory. V404 Cyg was extensively monitored by 
the INTEGRAL* satellite during the 15-day-long outburst starting on MJD 57,190 
(17 June 2015)*4. The data were acquired during satellite revolutions 1,554 to 1,563. 
In Extended Data Fig. 1 we present a 25-200-keV light curve obtained with the 
Imager on Board the INTEGRAL Satellite (IBIS) and the upper-layer detector 
INTEGRAL Soft Gamma-Ray Imager (ISGRI) in 64-s time bins®. 

The raw data were reduced in a very standard way, using the Off line Scientific 
Analysis software (OSA) version 10.1 (http://www.isdc.unige.ch/integral/download/ 
osa/doc/10.0/osa_inst_guide.pdf) similar to that described in the case of V404 Cyg"! 
and Cyg X-1°° (same region of the sky), respectively. IBIS is a coded mask telescope 
and the data reduction process is iterative: each active source in a given field projects 
its own shadow onto the detector, and hence contributes to the overall background 
of the other sources. Hence, to extract scientific products of one specific object one 
must consider all other active/bright sources within the field. 

Our reduction procedure started with the production of sky images and mosa- 
ics (obtained from combining data acquired during the same satellite revolution) 
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in the user’s pre-defined energy ranges (here we used 20-40 keV, 40-80 keV, 
80-150 keV and 150-300 keV) to identify the most active sources over each rev- 
olution. In this region of the sky (Cygnus) two persistent sources are very bright 
and active in the hard X-ray/soft y-ray domain (~0.2-1 MeV), Cygnus X-1 and 
Cygnus X-3°, in addition to V404 Cyg. Occasionally, some other objects may show 
up (for example, EXO 2030+375), and are thus included in the reduction process 
of the revolution concerned. 

Light curves were then extracted in time steps of ~64s over the 25-60-keVand 

60-200-keV energy ranges. These energy bands are the same as those selected 
by the INTEGRAL Science Data Centre quick look analysis facility (http://www. 
isdc.unige.ch/integral/analysis#QLAsources), providing an independent check 
and also avoiding any potential saturation effects in the 20-25-keV energy range 
usually considered. 
Swift observatory. Following the initial alert'®, V404 Cyg was monitored with the 
Swift satellite throughout its outburst until it returned to quiescence. We have 
analysed a total of 43 observations acquired with the X-ray Telescope*” (XRT) 
taken from 17 days starting on MJD 57,188 (15 June 2015). 

A total of 35 observations were performed in the windowed timing mode, while 

8 were taken in the photon counting mode. Observations were processed using 
the HEASOFT v.6.17 software, in particular the XRTPIPELINE task. For each 
observation the 0.5-10-keV spectrum, light curve and image were obtained using 
XSELECT. We used a circular region of 40-arcsec radius centred at the source 
position (the inner ~9-11 arcsec were excluded for those observations affected 
by pile-up). A region of similar size and shape, positioned on an empty sky region, 
was used for the background. We created exposure maps and ancillary response 
files following the standard Swift analysis threads (http://www.swift.ac.uk/analysis/ 
xrt/), and we acquired the last version of the response matrix files from the High 
Energy Astrophysics Science Archive Research Center (HEASARC) calibration 
database (CALDB). 
Radio observations. V404 Cyg was observed extensively throughout its 2015 out- 
burst between 13 GHz and 18 GHz by the AMI-LA radio telescope (Cambridge, 
UK), operating as part of the University of Oxford’s 4 PI SKY (http://www.4pisky. 
org) transients programme. The first data were obtained on MJD 57,188.896 
(15 June 2015) in robotic response to the Swift trigger 643949. The observations 
took place within two hours of the Swift trigger, revealing a bright (>100 mJy) and 
fading radio flare**. Subsequently, we continued to monitor the source for up to 
10 h every day for the entire period covered by this report. 

Quick-look images of the AMI-LA observations were obtained with the fully- 
automated calibration and imaging pipeline, AMISURVEY™. After the outburst, 
a more careful calibration and radio-frequency interference excision of the raw 
data was done using AMI-REDUCE™. The calibrated data was then imported 
to the CASA package*”. Light curves were extracted in time steps of ~1s in six 
channels across the 5-GHz bandwidth via vector-averaging of the UV-plane data. 

Over the period of 15 days of maximum activity of V404 Cyg, flares with peak 

flux density of up to 3 Jy at 16 GHz are seen. The rise of the flares is generally opti- 
cally thick, while the decay is optically thin, consistent with adiabatically expanding 
blobs of plasma (constituting the jet). 
Fundamental parameters of V404 Cyg. V404 Cyg is a dynamically confirmed 
black hole X-ray binary with an orbital period of Pop = 6.47 days (ref. 8). The 
black-hole mass is in the range Mpy = (8-12)Mo, the error budget being domi- 
nated by the uncertainty in the orbital inclination*’. We use a black-hole mass of 
Mpy= 10M, and a mass ratio of q= Mgu/M2 = 0.067 in every calculation pre- 
sented in this paper*’, where M, corresponds to the donor-star mass in units of 
Mo. This results in an orbital separation of 2.2 x 10” km. The outer accretion disk 
radius Rout (in units of kilometres) can be expressed as“*: 


1 2 
Rout & 1.2 x 10° Mgy P3, 


where Mgp is expressed in units of Mo and Pop in days. We obtain Rout = 
9 x 10°km. We note that the vast majority of black holes have orbital periods 
shorter than ~2 days, which results in Rou;=4.1 x 10°km if we use the same values 
as for V404 Cyg for Mgy and q. 

P Cyg profiles. We discovered strong P Cyg profiles in both the hydrogen (Balmer) 
and helium (He 1) emission lines. They result from resonant scattering in an 
expanding outflow, and are well reproduced by models using a variety of velocity 
laws and a spherical geometry!*. 

Profile fitting. To constrain the velocities associated with the He 1 (\=5,876 A) 
P Cyg profile, we fitted every individual spectrum as follows: 

(1) We fitted a Gaussian to the central disk emission after masking both the blue 
P Cyg absorption and the red high-velocity emission bump. This fitted model 
was subtracted from the data. 
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(2) We subsequently fitted a two-Gaussian model to the residuals, one in absorp- 
tion to the P Cyg and another in emission to the red bump. To avoid degeneracy 
in the fit, both Gaussians were offset by the same velocity (sign understood) and 
set to have the same width. The intensities were left as free parameters. 

Fits provide a good description of the data for relatively deep profiles, as indi- 
cated by our visual inspections. We take the wind mean velocity to be the offset 
velocity, while Vy (the wind terminal velocity) is determined by adding to the 
wind mean velocity the half-width at 1/10th of the intensity. For days 1, 2 and 6, 
where strong profiles are present throughout the observations, we were able to 
track their evolution with time. On day 2 (Fig. 2) outflows are detected during the 
2h observation, but their properties—profile strength and velocity—change in 
response to flaring. Gaussian fits show a constant Vy throughout the observation, 
while both the amplitude and mean velocity vary following changes in the optical 
flux and ionization state. A similar flare-outflow correlation is witnessed on day 
6, where we measure V-;= 3,000kms"!. 

Possible physical interpretations. A variety of wind-launching mechanisms have 
been proposed in the literature. Here, we briefly discuss the three more widely used. 
(1) Radiation pressure. The wind results from Thompson scattering when the 
radiation field approaches Lgaa. Given the low luminosity at which the most 
conspicuous P Cyg profiles are detected (0.1%-1% of Lag), and the similar Vy 
values observed during the outburst, radiation pressure is probably not respon- 
sible for the observed phenomenology. However, the system might have reached 
Lega during the brightest phases of the outburst preceding the so-called nebular 
phase, and this mechanism could have contributed to the observed optically 
thin shell. On the other hand, we note that the velocity observed during this 
phase is similar to that measured in the P Cyg profiles, suggesting a common 
origin. 

(2) Line-driven winds. Such winds are expected to be inefficient in low-mass 
X-ray binaries since X-ray emission from the disk would over-ionize the 
wind*>. 

(3) Thermal wind scenario’. In this scenario, atoms reach a thermal velocity 
larger than the escape velocity and a wind is formed. Therefore, the launching 
radius Rj is approximately that for which an associated Keplerian velocity equals 
Vz. For Vr =3,000-1,500kms~! we obtain Rj=(1.5-6) x 10° km, respectively. 

Using a standard accretion disk model*° we can estimate the surface temperature 
of the disk at a given radius R by: 


Tp(R) = 


3GMBHMacc Ro 
8 R30 R 


where G is the gravitational constant, a is the Stefan—Boltzmann constant, and Ro 


: ie : 6GM 
is the disk inner radius Ro = —*". 
Cc 


The accretion rate is obtained using: 


M2.0x10 oa. 
ne 

where Lout is the Eddington scaled luminosity at the time of the outflow, c is the 
speed of light and 7) =0.1 is the accretion efficiency. Using Lou = (0.001-0.01)Leaa, 
we obtain Tp in the range 5,000-20,000 K, which is consistent with having both 
neutral hydrogen and helium. 

Mass outflow rate. A crude estimate of the mass loss can be obtained by direct 
comparison of the strongest profiles (day 2) with the classical atlas of theoretical 
P Cyg profiles!*. Following studies of cataclysmic variables” we calculated the rate 
of mass outflow (Mout) (in units of solar masses per year) as follows: 


TRV é 


Mou ¥ 1.1x107!8 x 
out fA 


where T is the average opacity, Xo is the rest wavelength, f; is the ionization fraction, 
A is the helium abundance, R is the radius of the emitting region (in solar radii) 
and gis the oscillator strength. C is an integral depending on the shape of the P Cyg 
profile, which takes typical values in the range 0.2-0.5. By visual inspection we set 
T & 2 by comparing our profiles with those of the atlas. 

Using Vr = 2,000 km sv}, g=0.61, C=0.2 and A=0.08 we obtain 
Mout > 1X 10~'4Mg yr~!. Note that this is a lower limit because (1) we have 
assumed that f;=0.5, which might be much lower depending on, for example, the 
exact value of the wind-launching radius (although the low He 11 (A= 4,686 A)/HB 
intensity ratio [patio = 0.5 advocates for a substantial fraction of neutral helium), 
and (2) we used R=6 x 10°km for the proposed launching radius, which could be 
much larger if, for example, the shell remains optically thick ~300 s (0.003 days) 
after the outflow is launched. Indeed, if the shell is optically thick for at least 0.01 
days (see below), we obtain Mout > 107 3Mz yr! (for C=0.5). 


On the other hand, the contemporaneous Eddington-scaled accretion rate for 
Lout= 0.001 Lega Would be Macc © 107!°Mz yr! and then Moxt > 1073, 


acc 


Nebular phase. From days 9 to 11 we observe major changes in the spectrum, 
as the X-ray, radio and optical fluxes drop by 2-3 orders of magnitude from the 
outburst peak: (1) A weak P Cyg profile is still present on days 9 and 10, which 
means that the ejecta are optically thick; (2) higher-excitation emission lines 
become strong on day 10 and Balmer line equivalent widths start to increase; 
(3) on day 11, emission lines become unprecedentedly broad and intense, showing 
zero-intensity breadths of ~6,000 km sland equivalent widths of ~2,000 A (Ha; 
Fig. 3, Extended Data Fig. 1). This results from material expanding at the outflow 
velocity (+3,000 km s~') and becoming optically thin, as typically observed in 
expanding nova shells. We note that Ha saturated the detector on day 10, so its 
intensity has to be taken as a lower limit. Similar spectra to that presented in Fig. 3 
(day 11) were observed during days 12, 13 and 14, although line intensities 
progressively decay. Day 15 data show features typical of quiescent black hole 
transients, including a double peaked Ha emission line. 

Diffusion timescale and ejected mass. The timescale of this transition is the diffusion 
timescale of an expanding shell with mass Mgheu, and it is estimated”!** to be 
taig-¥ 23 days (1/Ris)(Mgneu/Mo), where Ris is the radius of the spherical envelope 
in units of 10'S cm. The transition from optically thick to optically thin ejecta 
occurs between days 10 and 11, as we observe a P Cyg profile and BD = 2.5 on day 
10 and broad wings and BD ~5 on day 11 (Extended Data Fig. 5). This means that 
the outflow becomes optically thin in taig< 1 day if ejection of matter continues up 
to day 10. On the other hand, on day 11 we have two separate groups of observa- 
tions with BD increasing across ~0.01-day timescales, which suggests that this 
timescale could be relevant in the expansion. Extrapolating from this variation, we 
predict a maximum f,i¢ 0.1 day. On the other hand, we do not observe substantial 
changes from spectrum to spectrum (tgi¢ > 0.002). Assuming a 3 x 10° km launch- 
ing radius and material travelling at 3,000 km s-! we obtain Ri; =9 x 10° to 
3 x 10-3, This yields Mghei = (10° 8-10-°) Mo for tair 0.002—0.1, respectively. This 
order-of-magnitude calculation is consistent with the amount of matter expected 
to be stored in a large accretion disk such as that of V404 Cyg (~10 °Ma)”. 
Similarly, it also explains nicely why the optically thin nebulae is detected right 
after the end of the outburst (<1 day), a much shorter timescale than typically 
observed in supernovae (tens to hundreds of days) where >(1-10)Mo are 
expelled”!. The above estimates are quite approximate (at least a factor of ~2) and 
they assume a spherical geometry. Nevertheless, our results are consistent with a 
substantial fraction of the disk being ejected during the outburst. On the other 
hand, an increase in the equivalent hydrogen column density (Ny of up to a few 
times 10”) has been reported for both the 1989” and 2015” outbursts. Assuming 
a spherical geometry for the wind and constant density across the outflow, we 
predict Ny ~ 1071-1074 cm~? if Mghe & (107 8-107>) Mz were expelled. 

Mass transferred by the donor during the inter-outburst period. Given the long 
orbital period of V404 Cyg, the mass transfer rate from the donor Mj (in units of 
solar masses per year) can be estimated using the following expression“: 


— My = 4.0 x107 1PM” 


where Py is the orbital period in days. This results in —Mz~ 1.3 x 10-°Mg yr74, 


which translates into -AM)~3 x 10° 8Mz across the 26-yr-long inter-outburst 
period. 

Accreted mass estimate. The total mass accreted during the 2015 outburst based on 
the observed X-ray luminosity can be estimated using the following expression: 


[is 


AMx i ar 
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where f Lx is the integrated X-ray luminosity throughout the outburst. This was 
estimated by converting the observed Integral count rate to flux in the 10 keV to 
1 MeV band (https://heasarc.gsfc.nasa.gov/cgi-bin/Tools/w3pimms/w3pimms.pl). 
We assumed"! a power-law spectral model with a photon index in the range 
T’=1-2 and Ny 0.7 x 10, even though the result does not depend on the 
Ny value. Using 7 =0.1 we obtain AMx = (0.6-2.3) x 107° g, that is, 
(0.3-1.1) x 10-°M 5. The mass that is implied by the soft X-ray luminosity 
(0.5-10 keV), and that is therefore sensitive to (variable) absorption effects, is esti- 
mated from both Swift and Integral (extrapolation) to be only a few times 10” g. 
Our results are compatible with the value of AMx ~3 x 10*°g inferred during 
the 1989 outburst’, showing that only ~(0.5-1) x 10? of the total mass stored in 
the disk (Rout=9 x 10°km) was accreted. The disk mass?” varies as ~R3\,,, 
which in turn implies that the accreted mass corresponds to that within 
Race ¥ (6-9) X 10° km. 

Renewed activity of V404 Cyg!*51, at a much fainter level, was detected during 
the period 23 December 2015 to 5 January 2016, that is, only about six months 
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after the main 2015 outburst. This timescale is consistent with the viscous time for 
refilling Racc as compared with accreting white dwarfs. This can be expressed as: 


ty © Nwp X 2.87 x 10’RYo'M py 


where Rio is the radius in units of 10!°cm and Nwp was (roughly) calibrated to be 
~0.05 using observations of accreting white dwarfs™. Using Race (6-9) X 10°km, 
we obtain ty + 140-180 days, which is compatible with the time lapse between the 
two outbursts. 
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Extended Data Figure 1 | Evolution of the main parameters during the to 0kms~! and Ha equivalent width. The Balmer decrement (black) and 
outburst. Zero time is set to 17 June 00:00 ut. From top to bottom we Tratio (red) are shown in the bottom panel. In the top panel, X-rays have 
show hard (25-200 keV) and soft (0.5-10 keV) normalized X-ray count been normalized to their respective peak at ~Lgqq and the time intervals 
rates, radio flux (~16 GHz), optical continuum flux, He 1 \=5,876 A corresponding to the GTC observations have been greyed out for clarity. 


equivalent width (EW; positive for absorption) in the range —3,000kms~! 
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Extended Data Figure 2 | Trail spectrum showing GTC spectra taken during days 1-6. P Cyg profiles are apparent in seven transitions of neutral 
hydrogen (Ha and Hf) and helium. The strongest are observed in days 1, 2 and 6, being more prominent in the He 1 \= 5,876 A transition. Similar 
profiles are seen in another five transitions at shorter wavelengths (not shown). 
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Terminal velocities in the range Vr = 1,500-2,000 km s-! are observed 
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Extended Data Figure 4 | Evolution of the Ha emission line. The is symmetric if V/R= 1) showing the same phenomena. This strongly 
top panel shows the evolution of the centroid of the Ha line. Positive suggests the presence of continuous outflows from the outer disk 
velocity values are due to line asymmetries by blue absorption and red along the whole outburst. Error bars indicate the standard deviation of 
emission. The bottom panel shows the V/R parameter (the emission line measurements within each observing window. 
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Extended Data Figure 5 | BD evolution through the nebular phase. From day 10, the BD is observed to increase sharply, reaching ~5 on day 11 and ~6 
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Extended Data Table 1 | Log of the GTC observations 


Night Grism Trxe [s] TR [s] Nsprc 
17-06-2015 R1000B 60 84 36 
18-06-2015 R1000B 60 84 aD 
19-06-2015 R1000B 60 84 aD 
20-06-2015 R1000B 60 84 a5 
21-06-2015 R1000B 60 84 40 
22-06-2015 R1000B 60 84 85 
23-06-2015 R1000B 60 84 36 
24-06-2015 R1000B 60 84 36 
25-06-2015 R1000B 60 84 17 
26-06-2015 R1000B 40 64 3 
R2500R 70 94 6 
R2500V 70 94 6 
27-06-2015 R1000B 20 44 3 
R2500V 70 94 3 
R2500R 35 59 3 
R1000B 20 44 3 
R2500V 70 94 3 
R2500R 35 59 3 
28-06-2015 R1O000B 20 - 1 
R1000B 60 84 2) 
R2500V 360 384 3 
R2500R 120 144 3 
R1000B 120 144 13 
29-06-2015 R1000B 120 144 2 
R2500R 120 144 2 
01-07-2015 R1000B 120 144 2 
R2500R 120 144 2 
R1000B 60 84 2 
R2500R 60 84 2 


Texp is the exposure time per spectrum in seconds, TR is the actual time resolution and Nspec is 
the number of spectra taken on a given day and with a given configuration. 
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Vigorous convection as the explanation for Pluto’s 


polygonal terrain 


A.J. Trowbridge’, H. J. Melosh! , J. K. Steckloff? & A. M. Freed! 


Pluto’s surface is surprisingly young and geologically active’. One 
of its youngest terrains is the near-equatorial region informally 
named Sputnik Planum, which is a topographic basin filled by 
nitrogen (N2) ice mixed with minor amounts of CH, and CO ices!. 
Nearly the entire surface of the region is divided into irregular 
polygons about 20-30 kilometres in diameter, whose centres rise 
tens of metres above their sides. The edges of this region exhibit 
bulk flow features without polygons!. Both thermal contraction 
and convection have been proposed to explain this terrain!, but 
polygons formed from thermal contraction (analogous to ice- 
wedges or mud-crack networks)”? of N2 are inconsistent with 
the observations on Pluto of non-brittle deformation within the 
N>-ice sheet. Here we report a parameterized convection model 
to compute the Rayleigh number of the N2 ice and show that it 
is vigorously convecting, making Rayleigh-Bénard convection 
the most likely explanation for these polygons. The diameter of 


— 
300 km 


Figure 1 | New Horizon’s image of Sputnik Planum on Pluto. A mosaic 
image of Sputnik Planum is shown in a. Within the centre of the ice field, 
where the ice is presumably thickest, the polygons are approximately 
30km across’. Close to the edge, the average polygon diameter decreases 
to 20 km and then vanishes, leaving a smooth surface. A contrast- 


Sputnik Planum’s polygons and the dimensions of the ‘floating 
mountains’ (the hills of of water ice along the edges of the polygons) 
suggest that its N2 ice is about ten kilometres thick. The estimated 
convection velocity of 1.5 centimetres a year indicates a surface age 
of only around a million years. 

Previous work first proposed that convection or thermal contrac- 
tion could have formed the polygons on Sputnik Planum! (see Fig. 1). 
However, we find contraction unlikely: studies of Arctic ice-wedges 
show that the spacing of thermal contraction polygons is typically 
about five times the annual thermal skin depth (the depth to which 
the summer-winter thermal wave penetrates into the surface of a plan- 
etary body)‘. Using reasonable values for the thermal diffusivity of N2 
ice®, we compute the annual thermal skin depth to be about 100 m, 
corresponding to thermal contraction polygons around 500 m across; 
this is nearly two orders of magnitude smaller than the observed pol- 
ygons. Furthermore, contractional polygons require brittle failure of 


c 


enhanced version of a is given in b to better illuminate the polygons. The 
‘floating mountains are observable within the edges of these polygons, 
and can be seen in c, the zoom of the rectangle in a. Image credit: NASA/ 
John Hopkins University- Applied Physics Laboratory/Southwest Research 
Institute (2015). 
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Figure 2 | Calculated convection for Sputnik Planum polygons. The 
calculated Rayleigh number and Nusselt number as a function of surface 
heat flow (q;) and thickness of convecting layer (L) are shown in a and b, 
respectively. The blue line is the Rayleigh number; the green line is the 
Nusselt number (see Methods section). The black dot marks the point 
where the Rayleigh number reaches the critical value (~1,000) at which 
convection just begins. At this point, the Nusselt number equals 1, and 
heat is transferred entirely by conduction. The calculation as a function 
of surface heat flow (a) shows that the Rayleigh number remains above 
~1,000 for surface heat flows down to 4 x 10°4mW m ® for a 10-km-thick 
N> layer. The estimated surface heat flow for Pluto (3 mW m ~”) is marked 
by a vertical red line. For a constant heat flow of 3mW m ~”, the Rayleigh 
number (b) decreases with the thickness of the convecting layer until 
convection stops at a thickness of 425 m. 


the ice. However, viscoelastic deformation of N2 ice over annual (248 
Earth years) and diurnal (153 Earth hours) cycles® on Pluto can easily 
relax differential stresses on this timescale, preventing brittle failure 
in response to such slowly building stress (the Maxwell time, over 
which stresses relax by 1/e, of Nz ice at 40 K is about 4 min at a stress 
of 0.1 MPa). Although there are other ways to generate polygons (such 
as compaction of sediments over heavily cratered terrain’, extensional 
tectonic processes® or contraction of cooling cryovolcanic flows”!°, 
these processes occur on timescales that are significantly longer than 
the Maxwell time, and are inconsistent with the lack of craters and 
observed flow features within this region. 

The viability of Rayleigh-Bénard convection as an explanation for 
Pluto’s polygons depends critically on the thickness of the convecting 
layer''. Because spectral data only probes micrometres into the sur- 
face, the N2 ice could be only a thin surface veneer, making convec- 
tion impossible. However, this possibility is unlikely given the high 
exchange rates of the Nz atmosphere with a surface ice reservoir’. 
We can estimate the ice thickness from the observed polygon size 
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Figure 3 | The N2-CO phase diagram. We obtained this figure by 
collecting data points from experimental measurements'”. Above CO 


concentrations of 10%, both phases of nitrogen are stable at Pluto’s surface 
temperatures. 


and the depth to diameter ratio (that is, aspect ratio) for a Rayleigh- 
Bénard convection cell. Laboratory convection experiments and three- 
dimensional numerical modelling almost invariably predict aspect 
ratios near 3:1 (ref. 11), as assumed here. Some two-dimensional numer- 
ical simulations of convection in fluids with strongly temperature- 
dependent viscosity predict larger aspect ratios'*“, implying a thinner 
ice layer. However, N2-ice viscosity depends only weakly on temper- 
ature!® and falls in the small-viscosity-contrast regime that precludes 
aspect ratios larger than 3:1 (refs 14, 16). Moreover, if Pluto's ‘floating 
mountains’ (see Fig. 1c) are truly supported by their buoyancy, then 
their heights, widths and the small density contrast between N2 ice and 
water ice requires an N>-ice thickness of at least 5 km, as shown in the 
Methods. We account for uncertainty in the layer thickness by widely 
varying the depth of the convection cell within our model (Fig. 2). 

Using extrapolated N>-ice rheology measurements! (see Methods 
and parameters used in equation (9)), a surface temperature of 33 K, 
and a surface heat flux q, of ~3 mW m ? (consistent with the radio- 
genic heat generated by a carbonaceous chondrite core about 900 km 
in radius, as suggested by Pluto's mean density), we calculate a Rayleigh 
number Ra > 10° and an interior temperature of approximately 40K 
for the N>-ice layer. This Rayleigh number is four orders of magnitude 
greater than the critical value that denotes the onset of convection 
(Racit¥ 1,000), suggesting that Sputnik Planum’s N; ice is vigorously 
convecting. 

Figure 2 shows the calculated Rayleigh and Nusselt numbers for 
a range of surface heat flows and N> thicknesses. In our model, the 
Rayleigh number remains above the critical value for surface heat 
flows as small as 4 x 10-*mW m ~? (see Fig. 2a), suggesting that our 
results are robust against uncertainties in our estimated surface heat 
flux. Figure 2b illustrates that as the thickness of the convecting cell 
decreases, the Rayleigh number drops until convection ceases at a 
thickness of 425 m (at a nominal heat flow of 3mW m7”). Thus, the 
observed decrease in polygon size away from the centre of Sputnik 
Planum, and their absence at its edges (see Fig. 1), both suggest that 
the depth of Sputnik Planum’s N; ice is thickest at the centre, and thins 
to around 400 m near the edges, where the polygons are absent. This 
result is consistent with the hypothesis that N> ice in Sputnik Planum 
fills a topographic basin. 

From the calculated average velocity of convection, ~1.5cmyr7! 
(the equation for velocity is given in the Methods), we compute the 
time needed for the ice surface to renew itself, and therefore the max- 
imum age of the surface of Sputnik Planum, to be about one mil- 
lion years. This is consistent with the lack of significant cratering, 
and further constrains the existing age estimates of a few hundred 
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million years’ by two orders of magnitude. The convection model 
also correctly predicts the topography of the polygons. The centres 
of convection cells are underlain by (relatively) warm rising currents 
and should therefore stand higher than the edges of the cells, where 
cooler ice descends. The elevation of the polygon centres above their 
edges is estimated from the convective buoyancy stress to be about 
80 m (explained in more detail in the Methods), in good agreement 
with observations!. 

Our predicted central temperature of the convection cell of 
40 K is close to the «-to-8 phase transition temperature for N> ice 
(Tag = 35.61 K for pure Nj ice, but is higher if CO is dissolved in the N2 
ice!”). Furthermore, high concentrations of CO have been reported on 
Sputnik Planum!. CO and N> form a complete solid solution series (see 
Fig. 3). With a CO concentration >10%, both phases are stable at 40 K 
(ref. 17), which may allow a phase-change-induced mixed convection 
system to develop at these high Rayleigh numbers'®. This may consist 
of warm (3 N>2 upwelling in the convection cells, while « Nz concen- 
trates in the troughs of the polygonal features, which might explain 
the albedo difference between the polygons and troughs. Owing to 
the difference in absorption spectra for each phase, New Horizon’s 
Alice instrument can test this prediction of our convection model. 
Alternatively, a two-layer convection system may develop, in which the 
8 N> forms the lower, deeper convection cell while « N2 forms a layer 
of convection cells above. However, the positive Clapeyron slope of 
the «-to-8 phase transition makes this alternative scenario unlikely”. 
Rather, the exothermic 3-to-a phase change encourages single-cell 
overturn and produces a more stable convecting regime””. 

The N>-ice mass in Sputnik Planum seems to be the largest con- 
centration of N2 on Pluto. If our estimated 10-km ice thickness were 
evenly spread across the planet it would form a layer about 350m 
thick that would, if converted to vapour, produce an atmosphere with 
a surface pressure of about one bar instead of the currently observed 
pressure of approximately ten microbars. The present atmospheric 
pressure is presumably controlled by the vapour pressure of N> ice at 
the current surface temperature, so that N2 can either evaporate or 
condense onto the ice reservoir in Sputnik Planum as temperature 
varies during Pluto's seasons and annual excursions from the sun. Nz 
ice should thus be mobile across Pluto's surface. We do not at pres- 
ent understand why most of the N2 on Pluto is concentrated in what 
appears to be the basin of a large ancient impact crater: that must be 
the subject of future climatological studies. However, it is an obser- 
vational fact that most of the N2 on Pluto is concentrated in a single 
large mass that lies in a basin nearly at the equator rather than at its 
poles, which is perhaps related to Pluto’s large obliquity. The polygonal 
surface features of this mass indicate that it is vigorously convecting, 
and this convection is driven by the small amount of heat conducted 
through Pluto's lithosphere. 

Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Maxwell time calculation. The Maxwell time (7, in units of seconds) is deter- 
mined from the ratio of the effective viscosity (rer) to the shear modulus (/1 in 
units of pascals): 


— Tlett 
be 


Tm (1) 
The shear modulus was determined from experimentally measured shear wave 
velocities (v, in units of metres per second)”, while the viscosity parameters are 
derived from the measurements of N> ice! and are listed below. The shear modulus 
is related to the shear wave velocity and material density (p in units of kilograms 
per cubic metre) by the expression: 


= Vs"p (2) 


Ata differential stress of 0.1 MPa and a temperature of 30K, the Maxwell time is 
about 4 min. 

Temperature-dependent parameters for Rayleigh number. The temperature 
dependent values for the parameters in equation (7) (see below) were determined 
from best-fitting experimental data”!. The density, volume coefficient of expan- 
sion (a in units of per kelvin), and thermal conductivity (k in units of watts per 
millikelvin) of N2 are expressed by the following equations: 


p=0.0134T? — 0.6981T + 1038.1 (3) 
a= (2 x 10°) T? — 0.0002T + 0.006 (4) 
k=0.1802T°:1041 (5) 


The thermal diffusivity (« in units of metres squared per second) was deter- 
mined from the relationship « =k/(pcp), where c, (in units of joules per kilogram 
per kelvin) is the specific heat at constant pressure. Specific heat is also temperature 
dependent, which is expressed as: 


Cp = 926.9 1¢9-0°37 (6) 


Parameterized convection model. The Rayleigh number assesses the vigour of 
thermal convection in a fluid layer under the influence of gravity and a downward- 
increasing thermal gradient. The Rayleigh number is essentially the ratio between 
the timescales for conductive cooling and buoyancy-driven overturn of the viscous 
fluid. The conventional expression for the Rayleigh number"! requires knowledge 
of the temperature difference between the top and bottom of the convecting layer. 
Because the thermal gradient in Pluto is a priori unknown, but the surface heat flow 
can be estimated at about 3mW m * from internal heat production, we employ a 
version of the Rayleigh number based on surface heat flow!!: 


4 
i, (7) 
4 3 s 
Kkn 


where a is the volume coefficient of thermal expansion, p is the fluid density, g is 
the acceleration of gravity (0.62ms ? on Pluto), L is the depth of the convecting 
layer, « is thermal diffusivity, k the thermal conductivity, 7 is the viscosity (which 
depends strongly on temperature and stress in N ice), and q, is the surface heat 
flow. For Sputnik Planum, we used (see below) measured, temperature-dependent 
values for N2 in equation (1). This version of the Rayleigh number is related 
to the more standard version Ra through the Nusselt number Nu as follows: 
Ra, = Nu x Ra. The Nusselt number is the ratio between the total heat trans- 
ported by both convection and conduction to conductive heat transport only 
and is given by: 


nu=| Ra 


Acrit 


B 
for Ra>Racit (8) 


The critical Rayleigh number Ragrt is of order 10°, depending on detailed boundary 
conditions, and / has been measured to be 0.31 over a wide range of Ra values”? 
Equations (7) and (8) together define the “parameterized convection model’, 
which has been widely used to model heat transport in planetary mantles!!”°. The 
most important variable in these models is the viscosity of the convecting fluid. 
Unlike ideal liquids, the viscosity of a hot, creeping solid is a sensitive function of 
both deviatoric stress o and temperature T and is often parameterized by the form”*: 


gi-t 


3A cea @) 


Nete = 


where Q is the activation enthalpy for creep and R is the gas constant. A is a con- 
stant that, along with Q, and n, must be determined experimentally. The viscosity 
parameters we determined for N2 ice used in equation (9) are activation energy 
Q=3.5kjJmol!, n=2.2 and A=3.5x 10° Pa-"s-1. 

Because the effective viscosity depends strongly on both the temperature and 
stress in the convecting system, which vary widely from place to place, it is impor- 
tant to understand how to define them in a meaningful way. Previous work"! has 
shown that the best choice is to use the mean temperature and buoyancy stress for 
accurate estimates of convective vigour, a choice that we follow here. 

As they stand, equations (7) to (9) do not define a closed system and more infor- 
mation is required to compute Ra, even given the heat flow and material properties 
of the convecting fluid. The system can, however, be closed by recognizing that the 
mean temperature in a convecting fluid is determined by the surface temperature, 
T;, heat flow qs and the temperature drop across the cold (surface) boundary layer, 
whose thickness is itself determined from the Nusselt number. We thus set: 


L 
r=%4—% 


k(Nu +1) ye) 


where we have ignored the adiabatic increase of temperature in the convecting 

layer, a valid approximation for a thin layer, such as the N3-ice deposit in Sputnik 

Planum. Further adding an equation for the average deviatoric stress in convecting 

plumes: 

o= ee | Mest ( 1 1) 
Nu*} 1? 


Equations (7) to (11) now define a closed, if highly nonlinear, system that can 
readily be solved by numerical methods to define most of the properties of the 
convecting layer from the properties of the fluid, the surface heat flow and surface 
temperature. 

Viscosity of N2 ice. Equations (3) to (6) describe all the material parameters in 
equation (7) except for viscosity. The stress-dependent parameters, A and n, within 
the viscosity equation are directly quoted from previous works!°. The temperature- 
dependent parameter was determined from existing stress and strain rate (that is, 
viscosity) measurements for solid N2 at 45 K and 56K (ref. 15). By matching data 
points for the viscosity measured at two temperatures under the same applied strain 
rate, we can solve for the temperature-dependent parameter, Q, in equation (9). 
Velocity of the convecting fluid. The mean velocity of a convecting layer is com- 
puted by comparing the surface heat flow to the rate at which warm fluid moves 
towards the surface and deposits its thermal energy. This equality can be written 
in terms of the Nusselt number Nu as: 


(12) 


K 
Teony = 7 (Nu — 1) 

where & is the thermal diffusivity and L is the depth of the convective cell. 
Topographic relief of convecting terrain. We estimate the difference in elevation 
h between the upwelling centres of the polygons and their sinking margins by 
equating the buoyancy stress in the convecting fluid to the stress generated by 
topography, pgh, where p is the density and g is Pluto's surface acceleration of 
gravity (0.62 ms *). The buoyancy stress is equal to the density deficit of the warm, 
rising fluid, pa AT, where a is the volume coefficient of expansion and AT is the 
temperature difference between the hot and cold boundaries of the convecting 
system. The density deficit is multiplied by the height of the convection cell, L, 
times g to define the convective stress, from which we deduce: 


h =aATL (13) 


However, AT is not known a priori. We can define it more precisely in terms of 
quantities better defined in convecting systems by exploiting the conventional 
definition of the Rayleigh number Ra to solve for AT and write: 


na Ra Mest 


a (14) 


Inserting this expression into our system of parameterized convection equations 
for the nominal case of a heat flow of 3mW m~? and L = 10km yields an estimate 
of about 80 m for the difference in elevation between the centre and edges of the 
Rayleigh-Bénard cells, as we report in the text. 

The ‘floating mountains’ of Sputnik Planum. Dark material congregates along 
the edges of the Sputnik Planum polygons (see Fig. 1). These hills (currently called 
‘floating mountains’ in NASA press releases) rise hundreds of metres! above the 
surrounding terrain within Sputnik Planum. Owing to the albedo contrast with the 
surrounding N> ice, the material seems likely to be composed of water ice!. Nearly 
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all of the ice chunks are located at the edges of the cells rather than the centres, 
and are arranged in lines and arcs that do not resemble the rims of submerged 
impact craters. Because such a non-random distribution of mountains is unlikely, 
we conclude that the mountains are afloat and are moved by N2 convection to the 
edges of the polygons. Although it is possible that the downwelling limbs of the 
convection cells have aligned with grounded water-ice mountains in thinner N2 
ice, the polygonal arrangement of these mountains at the same distance scale as the 
mountain-free polygons strongly suggest that their arrangement was determined 
by the dynamics of the convection cells rather than vice versa. 

If these ‘floating mountains are icebergs, then we can calculate the minimum 
depth of N2 ice beneath each one that is needed to generate strong enough buoy- 
ancy forces to keep it afloat. According to Archimedes principle, the depth of the 
bottom of an iceberg of height h above the surface is: 


Tey (15) 
a 


Pw 


d= 


where py is the density of N> and py is the density of H2O. At Pluto temperatures of 
~37K (ref. 1), the density of water ice is ~930 kgm? (refs 25, 26), and the density 
of No is ~1,030kgm_? (from equation (3)). Using these densities and a height of 
500m for the iceberg topography, we calculate a minimum depth of 5 km. 

The horizontal extent of these mountains also gives clues to their depths because 
tall, narrow cylindrical masses of ice are not stable: they would tilt to achieve a 
minimum gravitational energy configuration. The largest observed masses are 
about 5 km across (see Fig. 1), suggesting a minimum N) ice depth comparable to, 
or greater, than this distance. 

Effect of r,, on the aspect ratio for Rayleigh-Bénard convection. The temperature- 
dependence parameter r, is a non-dimensional ratio between the viscosities at the 
top and bottom of the convection cell that determines the regime of convection 
(transitional mode, stagnant-lid mode or small-viscosity-contrast mode) as well as 
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the aspect ratio!“, We show below that the N>-ice layer in Sputnik Planum is well 
within the small-viscosity-contrast regime, so special considerations for convection 
in strongly temperature-dependent fluids do not apply. 
The ratio r,, is given by the following formula’: 
E(Tp—Ts) ( 16) 
where Ty is the temperature at the bottom of the convection and T; is the surface 
temperature. Within equation (16), the E term is a constant determined by fitting 
the temperature-dependent viscosity equation to rheologic measurements for a 
material'® and is defined as: 


mae 


E=-2, (17) 


where T is the mean temperature, R is the gas constant, and Q is the activation 
energy (see equation (9)). Using a temperature of 45 K and the activation energy 
for N> (see parameters used in equation (9)), we calculated an E constant of ~0.2, 
corresponding to an r, value of ~15 for No. This value for r, places the N2 in 


Sputnik Planum within the small-viscosity-contrast convection regime, where an 
aspect ratio greater than 3:1 is not possible for Ra > 10° (refs 14, 16). 
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Convection in a volatile nitrogen-ice-rich layer 
drives Pluto’s geological vigour 


William B. McKinnon!, Francis Nimmo’, Teresa Wong’, Paul M. Schenk’, Oliver L. White’, J. H. Roberts?, J. M. Moore’, 
J. R. Spencer®, A. D. Howard’, O. M. Umurhan‘, S. A. Stern®, H. A. Weaver”, C. B. Olkin®, L. A. Young®, K. E. Smith* & 
the New Horizons Geology, Geophysics and Imaging Theme Team* 


The vast, deep, volatile-ice-filled basin informally named Sputnik 
Planum is central to Pluto’s vigorous geological activity!”. 
Composed of molecular nitrogen, methane, and carbon monoxide 
ices*, but dominated by nitrogen ice, this layer is organized into 
cells or polygons, typically about 10 to 40 kilometres across, that 
resemble the surface manifestation of solid-state convection!”. Here 
we report, on the basis of available rheological measurements’, 
that solid layers of nitrogen ice with a thickness in excess of about 
one kilometre should undergo convection for estimated present- 
day heat-flow conditions on Pluto. More importantly, we show 
numerically that convective overturn in a several-kilometre-thick 
layer of solid nitrogen can explain the great lateral width of the cells. 
The temperature dependence of nitrogen-ice viscosity implies that 
the ice layer convects in the so-called sluggish lid regime’, a unique 
convective mode not previously definitively observed in the Solar 
System. Average surface horizontal velocities of a few centimetres 
a year imply surface transport or renewal times of about 500,000 
years, well under the ten-million-year upper-limit crater retention 
age for Sputnik Planum/?. Similar convective surface renewal may 
also occur on other dwarf planets in the Kuiper belt, which may help 
to explain the high albedos shown by some of these bodies. 

Sputnik Planum (SP) is the most prominent geological feature on 
Pluto revealed by NASA’s New Horizons mission. It is a ~900,000 km? 
oval-shaped unit of high-albedo plains (Fig. 1a) set within a topo- 
graphic basin at least 2-3 km deep (Fig. 1b). The basin’s scale, depth 
and ellipticity (~1,300 x 1,000km), and rugged surrounding moun- 
tains, suggest an origin as a huge impact—one of similar scale to its 
parent body as Hellas on Mars or South Pole-Aitken on the Moon®. The 
central and northern regions of SP display a distinct cellular/polygonal 
pattern (Fig. 1c). In the bright central portion, the cells are bounded 
by shallow troughs locally up to 100 m deep (Fig. 1d), and the centres 
of at least some cells are elevated by ~50 m relative to their edges”. The 
southern region and eastern margin of SP do not display cellular mor- 
phology, but instead show featureless plains and dense concentrations 
of kilometre-scale pits. 

Impact craters have not been confirmed on SP either in New 
Horizons mapping at a scale of 350m per pixel, or in high-resolution 
strips (resolutions as fine as 80 m per pixel). The crater retention age 
of SP is very young, no more than ~10 Myr based on models of the 
impact flux of small Kuiper belt objects onto Pluto’. This indicates 
renewal, burial or erosion of the surface on this timescale or shorter. 
Evidence for all three processes is seen in the form of possible con- 
vective overturn, glacial inflow of volatile ice from higher standing 
terrains at the eastern margin, and likely sublimation landforms 
such as the pits. In addition, the apparent flow lines around obsta- 
cles in northern SP and the pronounced distortion of some fields 


of pits in southern SP are evidence for the lateral, advective flow of 
SP ices! 

From New Horizons spectroscopic mapping, N2, CH, and CO ice all 
concentrate within Sputnik Planum’. All three ices are mechanically 
weak, van der Waals bonded molecular solids and are not expected to 
be able to support appreciable surface topography over any great length 
of geological time**"!°, even at the present surface ice temperature of 
Pluto (37 K)!. This is consistent with the overall smoothness of SP over 
hundreds of kilometres (Fig. 1b). Convective overturn that reaches the 
surface would also eliminate impact and other features, and below we 
estimate numerically the timescale for SP’s surface renewal. 

Quantitative radiative transfer modelling of the relative surface 
abundances of N2, CH4 and CO ices within SP" shows that N> ice 
dominates CH, ice, especially in the central portion of the planum 
(the bright cellular plains) where the cellular structure is best defined 
topographically (Fig. 1d). Ices at depth need not match the surface 
composition, but continuous exposure (such as by convection) makes 
this more likely. N, and CO ice have nearly the same density (close to 
1.0gcm~%), whereas CH, ice is half as dense as this”. Hence water-ice 
blocks can float in solid N> or CO, but not in solid CH4. Water ice has 
been identified in the rugged mountains that surround SP°, and blocks 
and other debris shed from the mountains at SP’s periphery appear to 
be floating”; moreover, glacial inflow appears to carry along water-ice 
blocks, and these blocks almost exclusively congregate at the margins 
of the cells/polygons, consistent with being dragged to the downwelling 
limbs of convective cells (Fig. 2a). This indicates that while CH4 ice 
is present within SP, it is not likely to be volumetrically dominant. In 
terms of convection, we concentrate on the rheology of N; ice. 

Deformation experiments for N2 ice show mild power-law creep 
behaviour (strain rate proportional to stress to the n =2.2 + 0.2 power) 
and a modest temperature dependence of its viscosity*. N> diffusion 
creep (n=1) has also been predicted'®'”, but not yet observed experi- 
mentally. Convection in a layer occurs if the critical Rayleigh number 
(Rac) is exceeded. The Rayleigh number, the dimensionless measure 
of the vigour of convection, for a power-law fluid heated from below 
is given by’? 


pgaA!/"ATD?+")/" 
k!/"exp(E* /nRT) 


Ra 


(1) 


where D is the thickness of the convecting layer, « is the thermal diffu- 
sivity, g is the acceleration due to gravity, p the ice layer density, a the 
volume thermal expansivity, AT the superadiabatic temperature drop 
across the layer, and A is the pre-exponential constant in the relation- 
ship between stress and strain-rate, E* is the activation energy of the 
dominant creep mechanism, and R is the gas constant. 
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Figure 1 | Image, topographic and map views of Sputnik Planum, Pluto. 
a, Base map showing locations of some figure panels. b, Stereo-derived 
topography, showing that Sputnik Planum (SP) lies within a kilometres- 
deep basin (depth coded on greyscale, see key at bottom right). Southwest- 
northeast banding and central basin ‘speckle’ are artefacts or noise 
(Methods); elevations are relative. c, Map of troughs (black lines), which 
define cell boundaries (note enlarged scale compared with a and b). 

Cell size increases and/or becomes less well connected towards SP centre, 
consistent with a thickened N,j ice layer there. Aquamarine shading 
indicates ‘bright cellular plains, within which troughs are topographically 
defined. d, 350 m per pixel MVIC image (position shown in a) that shows 
cellular/polygonal detail (north is to right). 


The critical Rayleigh number depends on the temperature drop and 
the associated change in viscosity’, as deformation mechanisms are 
thermally activated processes. For a given AT, the Rac; implies a crit- 
ical or minimum layer thickness, D,, below which convection cannot 
occur. This is illustrated in Fig. 3 for N> ice. We assume an average ice 
surface temperature of 36 K set by vapour-pressure equilibrium over an 
orbital cycle’, and an upper limit on the basal temperature set by the 
Nz ice melting temperature of 63 K (ref. 15). From Fig. 3 we conclude 
that convection in solid nitrogen on Pluto is a facile process: critical 
thicknesses are generally low, less than 1 km, as long as the necessary 
temperatures at depth are achieved. 

The temperature profile in the absence of convection is deter- 
mined by conduction. N2 ice has a low thermal conductivity’, which 
together with a present-day radiogenic heat flux for Pluto of roughly 
3mW m ~ implies a conductive temperature gradient of ~15Kkm™'. 
Over Pluto’ history, radiogenic heat has dominated Pluto’ internal 
energy budget!®!’; we argue that relatively unfractionated, solar 
composition carbonaceous chondrite is the best model for the rock 
component of worlds accreted in the cold, distant regions of the Solar 
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Figure 2 | High-resolution images of cellular terrain within SP. 

a, Kilometre-scale hills appear to emanate from uplands to the east 

(at right), and are probably darker water-ice blocks and methane-rich 
debris (arrows) that have broken away and are being carried by denser, 
N>-ice-dominated glaciers into SP, where they become subject to the 
convective motions of SP ice, and are pushed to the downwelling edges 

of the cells at left. b, Part of the highest-resolution image sequence taken 
by New Horizons (80 m per pixel); surface texture (for example, pitting) 
concentrates towards cell boundaries and in regions apparently unaffected 
by convection (such as at right, see text). 


System!*. The abundances of U, Th, and *°K are consistent across the 
most primitive individual examples of this meteorite group (the CI 
chondrites), to within 15% (ref. 18), and Pluto’s density implies that 
about 2/3 of its mass could be composed of solar composition rock 
(the rest being ices and carbonaceous material)'®. Nevertheless, 
regional and temporal variations in heat flow are possible, so Fig. 3 
illustrates the temperatures reached as a function of depth, with the 
conclusion being that even under broad variations in heat flow, tem- 
peratures sufficient to drive convection in SP are plausible for N>-ice 
layers thicker than ~500 m. 

Clearly, the horizontal scale of the cells in SP (Figs 1d, 2a, b) should 
reflect the vertical scale (depth) of the SP basin ice fill, but this presents 
a problem. For isoviscous Rayleigh—-Bénard convection, the aspect 
ratio (width/depth) of well-developed convection cells is near unity. 
Numerical calculations by us for Newtonian and non-Newtonian con- 
vection in very wide 2D domains, but without temperature-dependent 
viscosity, give aspect ratios near 1 (Methods). If the cells/polygons on 
Sputnik Planum are the surface expression of convective cells, then 
cell diameters (wavelengths \) of 20-40 km imply depths to the base 
of the No ice layer in SP of about 10-20 km. This is very deep, and 
much deeper than any likely impact basin, especially as the surface of 
SP is already at least 2-3 km below the surrounding terrain (Fig. 1b). 
The deepest impact basins of comparable scale known on any major 
icy world are on Iapetus, a body of much lower density (hence lower 
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Figure 3 | Minimum thickness for convection in a layer of solid 
N>? ice on Pluto, as a function of basal temperature. Convection can 
occur above the solid red curve provided a sufficient perturbation 
exists (area labelled ‘convection possible’). Limit is based on numerical 
and laboratory experiments and theory and creep measurements for 
nitrogen ice (Methods). Basal temperatures due to conductive heat flow 
(3mW m ”) from Pluto are shown for comparison (solid black line), 
along with variations of a factor of 2 in heat flow (dashed black lines). 
For approximately present-day chondritic heat flows, basal temperatures 
exceed the convective threshold for layer thicknesses in excess of about 
500 m. In contrast, the minimum thickness for convection by volume 
diffusion creep would plot off the graph to the upper right. 


rock abundance and heat flow) and surface gravity than Pluto. Gravity 
scaling the depth from basins on Iapetus”°, we estimate the SP basin was 
initially no deeper than ~10km total (that is, before filling by volatile 
ices or any isostatic adjustment). 

The solution to this apparent problem (the SP ice thickness over- 
estimate) is probably the temperature dependence of the Nz ice 
viscosity. Given that the maximum AT across the SP N> layer is 27 K, the 
maximum corresponding Arrhenius viscosity ratio (An) for the exper- 
imentally constrained activation energy is ~150 (Methods); if we adopt 
the (larger) activation energy for volume diffusion”", this ratio potentially 
increases to ~2 x 10°. This potential range in A7 strongly suggests that 
SP convects in the sluggish lid regime*’*”. In sluggish lid convection 
the surface is in motion and transports heat, but moves at a much slower 
pace than the deeper, warmer subsurface. A defining characteristic of 
this regime — depending on Ray (the Rayleigh number defined with 
the basal viscosity) and Ay — is convection cells with large aspect ratios. 
This differs from isoviscous convection in which the aspect ratios are 
closer to one, or at the other end of the viscosity contrast spectrum, 
stagnant lid convection, in which aspect ratios are again closer to one but 
confined (‘hidden’) beneath an immobile, high-viscosity surface layer. 

We illustrate such temperature-dependent viscosity convection 
numerically, using the finite element code CitCom” (a typical example 
is shown in Fig. 4). Given that N2-ice rheology is imprecisely known 
(unlike well-studied geological materials such as olivine or water 
ice), we survey different combinations of Rap and An in a Newtonian 
framework (similar to previous work®”’), but with a rigid (no-slip) 
lower boundary condition appropriate to the SP ice layer (Methods). 
We find that aspect ratios easily reach values of 2 or 3 (or \/D of 
4 or 6), regardless of initial perturbation wavelength. In such instances 
cell dimensions between 20 km and 40 km across could imply a layer 
thickness as small as ~3-6km. We note that while these depths are not 
excessive, they are deep enough to carry buoyant, kilometre-scale water 
ice blocks. In addition, simulations with a free-slip lower boundary, 
which would apply to SP ice that is at or near melting at its base, yield 
aspect ratios as great as ~6 (\/D~ 12). 

Numerical simulations can be tested against SP observations by 
assuming reasonable heat flows (say, chondritic within +50%) and 
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Figure 4 | Example numerical model of N2 ice convection in SP. 

a, Temperature field showing large-aspect-ratio plumes and downwellings. 
Basal Rayleigh number Rap = 3 x 10°, viscosity ratio An. = e° +400, Nusselt 
number (dimensionless heat flow) ~ 3.2. White contour denotes the median 
temperature. b-d, Corresponding horizontal surface velocities (b), surface 
normal stress and dynamic topography (c), and surface heat flows (d). 
Non-dimensional values are shown are shown at left, and dimensional 
values at right assuming D=4.5km and AT= 20K. The calculated 
topography matches the scale seen within the bright cellular plains, and 
the average heat flow is consistent with radiogenic heat production in 
Pluto's rock fraction. Norm., normalized. 


comparing the resulting dynamic topography with that observed. Non- 
dimensional surface horizontal velocity v,, normal stress ,,, and heat 
flow q; for the example calculation are shown in Fig. 4b, c. To dimen- 
sionalize we choose D=4.5km and AT = 20K to match the typical 
horizontal scale of the cells (for example, nearly 30 km, with a convec- 
tive aspect ratio of 3) and give a chondritic heat flow (see Methods). 
The dynamic topography due to the thermal buoyancy of the flow is 
given by o-z/pg, and its scale is given at the right hand side of Fig. 4c. 
This dynamic topography is consistent with available measurements!”. 
Average surface velocities (Fig. 4b) in this example are a few centimetres 
per year, which for the horizontal scale of cells on SP translates into a 
timescale to transport surface ice from the centre of a given upwelling 
to the downwelling perimeter of ~500,000 years. This is well within 
the upper limit for the crater retention age for the planum, ~10 Myr 
(ref. 2). The surface heat flow variation is also notable, nearly double 
the mean over upwellings and close to zero over downwellings. This 
means that fine scale topography such as pitting or suncups driven 
by N2 sublimation? will be much more stable towards cell/polygonal 
edges, as the N> ice there will be as cold and viscous as the surface to 
considerable depth, which is consistent with the observations of surface 
texture? (for example, Fig. 2b). We also find slight topographic dimples 
over downwellings in some of our calculations, which may be related 
to trough formation at cell edges (Fig. 2b). The troughs themselves, 
however, are likely to be finite amplitude topographic instabilities of 
the sort seen on icy satellites elsewhere’, and are not captured by these 
convection calculations given that velocities normal to domain bound- 
aries are set to zero. 

Convection in a kilometres-thick N2 layer within Pluto’s SP basin 
thus emerges as a compelling explanation for the remarkable appear- 
ance of the planum surface (Fig. 1). Sputnik Planum covers 5% of 
Pluto’s surface, so having an N> ice layer several kilometres deep is 
equivalent to a global layer ~200-300 m thick. This is consistent with 
Pluto’s possible total cosmochemical nitrogen inventory”, especially as 
Pluto’s atmospheric nitrogen escape rate is much lower than previously 
estimated”°. For Pluto, SP acts an enormous glacial catchment or drain- 
age basin, the major topographic trap for Pluto’s surficial, flowing Nz 
ice. SP is essentially a vast, frozen sea, one in which convective turnover 
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(now, and even more vigorously in the past) continually refreshes the 
surface volatile ice inventory. A sealing, superficial lag of less volatile 
ices or darker tholins cannot develop”, and the atmospheric cycle of 
volatile transport is maintained. Moreover, larger Kuiper belt objects 
are known to be systematically brighter (more reflective) than their 
smaller cousins in the Kuiper belt”®. Convective renewal of volatile 
ice surfaces, as in a basin or basins similar to SP, may be one way in 
which the dwarf planets of the Kuiper belt maintain their youthful 
appearance. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Mapping and topography. The LORRI basemap in Fig. 1a was created from the 
5 x 4 mosaic sequence P_LORRI (890 m per pixel), taken by the New Horizons 
Long-Range Reconnaissance Imager (LORRI). Mapping of cell/polygon bound- 
aries (Fig. 1c) was carried out in ArcGIS using this mosaic and additional images 
from P_LORRI_Stereo_Mosaic (390 m per pixel). Figure la~c shows simple 
cylindrical projections, so the scale bars are approximate. Locations of Fig. 1d and 
Fig. 2a, b are shown as insets in Fig. la. Figure 2a is part of P_MVIC_LORRI_CA 
(MVIC Pan 2, 320 m per pixel), whereas Fig. 2b is a segment of the LORRI portion 
of P_LMVIC_LORRI_CA, the highest resolution image transect obtained at Pluto 
by New Horizons (80m per pixel). 

Stereo topography over Sputnik Planum (SP) and its environs was determined 
using the two highest resolution Multispectral Visible Imaging Camera (MVIC) 
scans, P_MPan1 (495 m per pixel) and P_MVIC_LORRI_CA (MVIC Pan 2, 320m 
per pixel). As MVIC is a scanning imager, each line must be individually registered 
carefully and pointing must be accurately known for stereo reconstruction. For 
Fig. 1b, Pluto was assumed to be a sphere of 1,187-km radius!, and elevations 
were determined using an automated stereo photogrammetry method based on 
scene-recognition algorithms”’. Spatial resolutions are controlled by the lower res- 
olution MVIC scan and, using this method, are further reduced by a factor of three 
to five. Vertical precisions can calculated through standard stereo technique from 
mr,(tane; + tane2), where m is the accuracy of pixel matching (0.2-0.3), rp is pixel 
resolution, and e; and e, are the emission angles of the stereo image pair. For Fig. 1b 
the precision is about 230 m, well suited for determining elevations of Pluto’s moun- 
tains and deeper craters as well as the rim-to-floor depth of the SP basin. It is not 
sufficient to determine planum cell/polygon elevations. In the planum centre, the 
dearth of sufficient frequency topography inhibits closure of the stereo algorithm, 
hence the noise in the centre of SP in Fig. 1b. 

The subtle topography of the raised cells within SP was determined from a 

preliminary photoclinometric (shape from shading) analysis (for example, 
ref, 28), and is subject to further refinement of the photometric function for the 
bright cellular plains. Photoclinometry offers high-frequency topographic data 
at spatial scales of image resolution, but can be poorly controlled over longer 
wavelengths. Photoclinometry is sensitive to inherent albedo variations, but can 
be especially useful for investigating features with assumed symmetry, such as 
impact craters, which allows a measure of topographic control. The ovular domes 
and bounding troughs of the bright cellular plains within SP are such symmetric 
features, and intrinsic albedo variations are muted in the absence of dark knobs 
or blocks, so photoclinometry is well-suited to determining elevations across 
individual cells within the bright cellular plains (Figs 1d and 2b). 
Critical Rayleigh numbers for convection. Solid state viscosities 7 generally 
follow a Arrhenius law 1 ~ exp(E*/RT) for any given rheological mechanism, 
where E* is the activation energy for the deformation mechanism in question, 
Ris the gas constant, and T is absolute temperature. For any given temperature and 
stress, one deformation mechanism generally dominates over another’’. Critical 
Rayleigh number values Rag, for convection, for a layer heated from below with 
fixed upper and lower boundary temperatures, depend on the deformation mech- 
anism (through the power-law exponent in the stress strain-rate relation n) and 
the viscosity contrast A7 across the layer due to the temperature difference AT. 
In what follows we adopt an exponential viscosity law based on a linear expansion 
of the Arrhenius law in E*/RT (the Frank-Kamenetskii approximation) to take 
advantage of previous theoretical and numerical work!>?!, This is also an good 
approximation for the problem at hand because the temperature and viscosity 
contrast across a layer of volatile ices on Pluto is limited by the surface temperature 
of the ices on Pluto (37 K at the time of the New Horizons encounter)!” and the 
melting temperature of No ice (63.15 K = 

For an exponential viscosity law, the driving (exponential) rheological tempera- 
ture scale is AT; © RT;?/E*, where T; is a characteristic internal temperature of 
the convecting layer. The viscosity ratio across the layer due to temperature is then 
defined as An= exp(@) = exp(AT/AT,). Rac, is then approximated, for large @ and 
in which T;~ the basal temperature 7}, by? 


exp(1)6 2(n+1)/n 


Rac ,0 & Rac ie EET 
28) Roe) TE 


(2) 


where Ra,;(7) is the critical Rayleigh number for non-Newtonian viscosity with 
no temperature dependence (2,038 for n= 1 and 310 for n=2.2, based on 
numerical results for rigid upper and lower layer or sublayer boundaries*”°9). For 
large 6, convection occurs in the stagnant lid regime, in which convective motions 
are limited to a sublayer below a rigid surface. This is not the regime SP operates 
in, but serves as a limiting case. The transition from stagnant lid to sluggish lid 
convection, which does apply to SP, occurs at 09, or An = 104, for n= 1, and 
at 0 13.8, or An 10°, for n=2.2 (ref. 13). The other convective regime limit 


is that of small viscosity contrast (Ar — 1). For SP, with a rigid lower boundary 
and a free-slip upper boundary, Ra; in this limit should be 1,101 (ref. 34) and 
~200 (estimate) for n= 1 and 2.2, respectively. We then estimate Ra.,(n, 0) for 
the sluggish lid regime, following refs 13 and 30, by linearly extrapolating in 
logAn-logRap space between the small viscosity contrast limit and the transition 
to stagnant lid convection: 


Rac(1, 0)  1,100exp(0 / 1.78) (3a) 


Rag(2.2, 0) © 200exp(0 /3.87) (3b) 


The minimum or critical volatile ice layer thickness D., above which convection 
can occur and below which it cannot follows as! 


n/(n+2) 


es Rag! /"exp(E*/nRT) (4) 
aa 3("+1)/20 Al!" po AT 

where &, p, and a are, respectively, the thermal diffusivity, density, and volume 
thermal expansion coefficient of the ice, and A is the pre-exponential coefficient 
in the stress strain-rate relationship. For N> ice, this is either measured directly* or 
estimated theoretically’’. The numerical factor in the denominator comes from the 
definition of viscosity and the conversion from laboratory geometry (A is measured 
in uniaxial compression) to the generalized flow law. For sluggish lid convection, 
we approximate T; as T, - AT/2, which is a slight underestimate for the problem 
under discussion, but one that makes D., in equation (4) an upper bound on the 
minimum thickness for convection. 

Equation (4) does not explicitly depend on ice grain size d. The power-law 
exponent reported for nitrogen ice deformation (n~ 2.2) suggests a grain-size sen- 
sitive regime such as a grain boundary sliding, as opposed to a purely dislocation 
creep or climb mechanism (which would be grain-size independent)*°. Grain 
sizes in the nitrogen ice deformation experiments were not reported‘, but it was 
noted that the grain sizes of similar experiments on methane ice were a few mm. 
This is a not atypical grain size for convecting upper mantle rock, or deep polar 
glacial ice on Earth, and is plausible for convecting water ice within icy satellites 
of the outer Solar System*®, so without further information we utilize the defor- 
mation experiment results for nitrogen* as is. Notably, however, in order for No 
ice to be identified spectroscopically at all on Pluto, very long optical path lengths 
are required (>>1cm)*”, so the grain sizes of the convecting ice within SP may 
be much larger than a few millimetres. Because grain-size-sensitive rheologies 
typically have viscosities proportional to d’ or d’, the presumed Np ice in SP may 
be much more viscous than in the reported experiments‘. On the other hand, the 
presence of convective cells in SP implies that the viscosity is not arbitrarily large. 
Grain sizes in the annealed, convecting ice are probably determined by stress levels 
and the presence of contaminants (such as bits of water ice or tholins) and minor 
phases (such as CHy-rich ice)**. Diffusion creep is also grain-size dependent, and 
in evaluating N> diffusion creep for comparison with Fig. 3 we adopt d= 1 mmas 
a nominal value, noting that for volume diffusion D.; scales as #3, The minimum 
thickness for convection by volume diffusion would plot off the graph in Fig. 3 to 
the upper right for d= 1 mm. Only if d were much smaller would Dg, for volume 
diffusion be comparable to that shown in Fig. 3. 

Regarding the potential role of CO ice in SP, we note the near-perfect solid 
solution between solid N2 and CO, and close similarities in density, melting tem- 
perature and electronic structure!’. Hence, if the deeper ice in SP were actually 
dominantly CO, it would behave much the same as pure N; ice, with the proviso 
that an N2-CO ice solid solution under Pluto conditions would, for CO fractions 
greater than 10%, crystallize in the ordered a-phase, as opposed to the disordered 
6B-phase of N>. We expect a-phase CO to be stiffer than its 3-phase counterpart, 
based on the viscosity differences between ordered and disordered water ice 
phases**. We stress, however, that the surface of SP, whatever its precise composi- 
tion, is itself not in the a-phase, for if so the 2.16-j1m N> absorption feature would 
not be observed>”. 

Regarding the potential role of CH, ice in SP, deformation experiments indicate 
similar behaviour to that of N2 ice, but CH, ice appears to be about 25 times more 
viscous than N> ice (that is, A is ~25 times larger at the same T and differential 
stress)*, and with a similar power-law index n. The minimum or critical D,, for 
convection within SP from equation (4) would than be about double that in Fig. 3 
if SP were in fact filled with CH, ice, so the convection hypothesis is just as valid 
for CH, ice as for N> ice. The geological and compositional data point to an 
N>-dominated layer, however, as discussed in the main text. 

Applying rheological data obtained in laboratory conditions to geological prob- 
lems often requires extrapolation to different stress and strain conditions. For con- 
vection these conditions are lower stresses and strain rates. This is true whether 
one is modelling convection in the mantle of the Earth or another terrestrial 
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planet (with peridotite), in the icy satellites of the giant planets (with water ice), or 
in the present case of Sputnik Planum (with volatile ices such as N2). The extra- 
polation is valid if the same stress mechanism or mechanisms dominate at the 
extrapolated conditions**’. The n values reported for laboratory deformation of 
N> ice and CHy ice’ are low enough (2.2 +0.2 and 1.8 + 0.2, respectively) that it 
seems implausible that some power-law, dislocation mechanism (n ~ 3-5) becomes 
dominant at lower stresses. Rather, the only likely transition would be, depending 
on T, to volume or grain-boundary diffusion (n = 1), which we already consider. 
Regardless, our understanding of N2 and other volatile ice rheology could be greatly 
improved, especially any dependence on grain size. 

Solid N> material parameters for Fig. 2 are as follows: « = 1.33 x 10°-7m’ s-1, 

a=2x 107K"!, E¥=3.5kJ mol"! (n=2.2), E¥=8.6kJ mol! (n= 1), 
A=3.73 x 10°? Pa? s-! (n=2.2), A=1.52 x 10-7 x (d/1mm)~? x (T/50K)~! 
Pa's-!(n=1), p=1,000—2.14(T— 36K) kg m3, and for the heat flow calcu- 
lations, conductivity k=0.2 W m! K7! (refs 4, 15, 21). Pluto’s surface gravity is 
0.617ms~? (ref. 1). 
Convection simulations. Numerical convection calculations were carried 
out with the well-benchmarked fluid dynamics finite element code CitCom”. 
CitCom solves the equations of thermal convection of an incompressible fluid in 
the Boussinesq approximation and at infinite Prandtl number. CitCom can solve 
the thermal convection equations using an Arrhenius viscosity or an exponential 
law (the Frank-Kamenetskii approximation). We used this latter approximation 
here, for both Newtonian (stress-independent) and non-Newtonian viscosities, 
to best compare our results with those in the literature>!37”. 

We first simulated solid state convection with a Rayleigh number Ra=2 x 104 
but with a non-temperature-dependent viscosity, in a very wide, rectangular 32 x 1 
domain, with 2,048 x 64 elements, to allow natural selection of convection cell 
aspect ratios (widths of convective cells divided by layer depth). Temperatures at 
the top and bottom of the domain were fixed. Free slip was assumed at the sur- 
face, no slip at the base (the volatile ice layer is in contact with a rigid, water-ice 
basement), and periodic, free-slip boundary conditions along the sides of the 
domain. Velocities normal to domain edges in all cases were zero. Simulations 
were allowed to reach steady state. Calculations were carried out for Newtonian, 
isoviscous flow, and for non-Newtonian (n= 2.2) flow, both with the same Rayleigh 
number. In both cases the planforms were characteristic of their entire respective 
domains, and the aspect ratios for the convective cells for both simulations were 
close to 1, as expected from theory and previous results. (For example, the crit- 
ical wavelength at Ra= Rag, for a plane layer heated from below, with boundary 
conditions appropriate for convection within SP, is 2.34 times the layer depth**.) 

A suite of calculations was then carried at a variety of Rap and top-to-bottom 
viscosity ratios An = exp(0) = exp(E*AT/RT;”), where Ray is defined as the basal 
Ra (that is, Tin equation (1) of the main text = T,). Rectangular 12 x 1 domains, 
with 768 x 64 elements, were used, with the same boundary conditions as above. A 
smaller number of calculations were also run with a free-slip lower boundary, for 
benchmarking with examples presented in ref. 5, and to simulate convection where 
the SP ice is at or near melting at its base. All runs in this suite were Newtonian, and 
while convective aspect ratios were not predictable from theory alone, they were 
expected to be much greater than 1 (ref. 5). In all cases simulations were allowed 
to reach steady state, or if time-dependent, to reach characteristic state behaviour. 

Our present survey covers a range of Rap between 10‘ and 10°, and a range in 
An between 150 and 3,000. This reflects our judgment that the convective regime 
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represented by the cells in SP ranges from the obviously convectively unstable to 
the subcritical (that is, stable) at the periphery of the basin (for example, Fig. 2b). 
The transition from cellular to non-cellular plains could reflect several things, 
including shallowing of the volatile ice layer, lower heat flow, and in the case of 
non-Newtonian flow, an insufficient initial temperature perturbation!*3"*°, The 
simplest explanation, however, for smaller cell sizes with distance from the centre 
of SP (Fig. 1c), and then a transition to level plains (no cells) towards the south 
(for example, Fig. 2b), is that the SP basin is shallower towards its margins, and par- 
ticularly shallow towards its southern margin. This is consistent with the expected 
basin topography created by an oblique impact to the SSW“. The less well defined 
cellular structure in the very centre of SP may, in contrast, reflect the deeper centre 
of the basin, implying a larger Ra for the N> ice layer there and more chaotic, time 
dependent convection. 

Our numerical simulations are carried out in in terms of dimensionless para- 
meters, and do not presuppose any particular values for the depth of the SP volatile 
ice layer or Pluto’ heat flow, and so on. They can be dimensionalized to determine 
if various measureable or estimable quantities are matched or are at least self- 
consistent. Depths and lengths scale as D, velocities as K/D, stresses as yk D? 
(1 is the basal viscosity), and heat flows as kAT /D (ref. 22). For example, for a 
given simulation, D can be scaled from surface cell size. Then different heat flows 
imply different AT. At fixed D and Rap, 1, stresses, and dynamic topography all 
scale with AT. 

Code availability. CitCom is freely available, in the version CitComS, released 
under a General Public License and downloadable from the Computational 
Infrastructure for Geodynamics (http://geodynamics.org). 
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Attosecond nonlinear polarization and light-matter 


energy transfer in solids 
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Electric-field-induced charge separation (polarization) is the most 
fundamental manifestation of the interaction of light with matter 
and a phenomenon of great technological relevance. Nonlinear 
optical polarization!” produces coherent radiation in spectral 
ranges inaccessible by lasers and constitutes the key to ultimate- 
speed signal manipulation. Terahertz techniques* * have provided 
experimental access to this important observable up to frequencies 
of several terahertz?" 3. Here we demonstrate that attosecond 
metrology" extends the resolution to petahertz frequencies 
of visible light. Attosecond polarization spectroscopy allows 
measurement of the response of the electronic system of silica to 
strong (more than one volt per angstrém) few-cycle optical (about 
750 nanometres) fields. Our proof-of-concept study provides time- 
resolved insight into the attosecond nonlinear polarization and the 
light-matter energy transfer dynamics behind the optical Kerr effect 
and multi-photon absorption. Timing the nonlinear polarization 
relative to the driving laser electric field with sub-30-attosecond 
accuracy yields direct quantitative access to both the reversible 
and irreversible energy exchange between visible-infrared light 
and electrons. Quantitative determination of dissipation within 
a signal manipulation cycle of only a few femtoseconds duration 
(by measurement and ab initio calculation) reveals the feasibility of 
dielectric optical switching at clock rates above 100 terahertz. The 
observed sub-femtosecond rise of energy transfer from the field to 
the material (for a peak electric field strength exceeding 2.5 volts per 
angstrém) in turn indicates the viability of petahertz-bandwidth 
metrology with a solid-state device. 

Matter responds to electromagnetic radiation by a displacement of its 
electrons with respect to the nuclei, turning its atomic constituents into 
dipole antennas. The overall strength of these dipoles per unit volume is 
characterized by the polarization vector, P. Its dependence on the inci- 
dent electric field, E(t), describes the macroscopic material response. 
Its nonlinear component, Pyz, constitutes the basis for manipulating 
the electronic and optical properties with the electric field of light!”. 
The energy transferred from the electromagnetic field to the medium 
per unit volume can be expressed as: 


we=f E(t!) Prat ae! (1) 


Here we assume that the contribution of linear polarization to W(t) is 
negligible. This is a prerequisite for ultrahigh-rate signal manipulation, 
which relies on low dissipation. In fact, it is this dissipation that has 
limited the clock rate in contemporary integrated digital electronics to 
several gigahertz’> for more than a decade!®"’. 

A substantial increase of the electronic processing speed requires 
a new paradigm that is capable of greatly reducing the dissipation 


per switching cycle. Recent experiments indicated a possible way of 
advancing contemporary microwave electronics to the frequency 
of visible light by manipulating the electronic and optical properties 
of wide-bandgap materials with strong visible light fields at photon 
energies much smaller than the bandgap of the material'*’°. However, 
the crucial question of how the energy density deposited irreversibly 
per switching cycle, Wirreversible: relates to the reversible energy exchange 
per unit volume, Wyeversible Could not be answered. Pushing the frontiers 
of information processing to optical frequencies requires minimizing 
Wirreversible While keeping W,eversible high enough for reliable signal pro- 
cessing. Insight into field-matter energy exchange at optical frequencies 
requires access to W(t) on a sub-femtosecond scale. 

To this end, we propagated a strong, linearly polarized field E(t) and 
its strongly attenuated replica Eyer (t) = GE(t) through a thin sample of 
a transparent wide-bandgap material, in our case fused silica, of thick- 
ness ¢. The attenuation factor ( is sufficiently small to prevent any 
observable nonlinear material response to E,e¢(t). Both transmitted 
waveforms are recorded in a measurement sequence as outlined in 
Fig. 1, once attenuated after and once attenuated before the sample by 
the same attenuation factor 3. We show in Supplementary Information 
section 1 how a difference between these transmitted waves, 
AE(t) =E(£, t)— 3 'Ered( t), directly yields the nonlinear polarization 
Pyz(t) induced by the strong field E(t); see Supplementary 
equation (13). 

In our experiments, we focus few-cycle near-infrared waveforms 
carried at a wavelength of \=750 nm into thin fused silica samples 
(¢€=10m); for details of the experimental setup and procedures see 
Supplementary Information section 2. The focus of the transmitted 
waveform (that is, the interaction region) is imaged into an attosecond 
streak camera”®”!. Here the temporal evolution of the transmitted elec- 
tric fields E(z= 4, t) and Exe z= %, t) (henceforth referred to as E(t) 
and Eye t)) is sampled with sub-250-as extreme-ultraviolet pulses. The 
peak intensities I,ea of the strong and attenuated waves have been set 
to (1.3 -£0.1) x 10'4 Wcm and (6.7 + 0.3) x 1012 Wcm ~?, respec- 
tively. Figure 2a compares the transmitted fields and reveals the 
evolution of the nonlinear phase shift, Ayyz(d), induced by the strong 
field. Avnr(t) increases towards the pulse peak, tapers off on its tail 
and finally vanishes; see insets to Fig. 2a. For Epeak © 2.6 0.1 vA 
the induced phase shift at the field maximum amounts to AyYmax = 
0.7 + 0.1 rad, which translates into a change of the refractive index by 
Anz (0.90.1) x 107. 

The field-induced phase shift evaluated at the pulse centre, AYpeaks 
is depicted in Fig. 2b and exhibits a linear scaling with the applied 
peak intensity. In contrast to previous research?” 4, our time-resolved 
study reveals the absence of saturation of the optical Kerr effect up to 
Epeak 2.7 V A, close to the threshold for dielectric breakdown for 
few-cycle laser pulses. The Kerr nonlinearity therefore appears to be 
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Figure 1 | Attosecond spectroscopy of the nonlinear polarization. 
To induce a nonlinear material response Py, (t), the incident strong 
field E(0, t) is transmitted through the sample, and subsequently its 
amplitude is decreased by the attenuation factor (3 before sampling the 
transmitted electric field waveform in a streak camera setup (I). The 
nonlinear polarization response is ‘deactivated’ by attenuating the 


potentially suitable for petahertz-scale signal manipulation and metrol- 
ogy beyond critical fields!” Foie A,/(ea) (where A, denotes the band- 
gap, e=|e| is the elementary charge, and ais the lattice period; for silica, 
Exit 2V A), provided that dissipation originating from carriers 
promoted into the conduction band during the nonlinear interaction 
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Figure 2 | Sub-femtosecond-resolved optical Kerr effect in silica. 

a, After passage through a 10-\1m-thick fused silica sample, the electric 
field E(t) of the few-cycle near-infrared pulse with a peak intensity of 

1.3 x 10'4W cm~?, approximately 10% below the threshold for optical 
damage, is modified as a result of the nonlinear light-matter interaction, 
as revealed by its comparison to a low-intensity (Ipeak=7 x 1017 Wcm~*) 
reference waveform E,e,(t) (for 4 = 0.27). This comparison yields a 
transient positive phase shift induced by the strong field, as anticipated 
from the dynamic increase of the refractive index owing to the optical 
Kerr effect. The two insets show close-ups of the comparison near the 
centre and at the end of the pulse, revealing the full reversibility of the 
effect. E(t) and E,.((t) are obtained from averaging a set of three recordings 
performed under identical conditions on individual samples. b, The phase 
shift Aypeak evaluated at the peak of the field envelope for different peak 
intensities Ipeak Of E(t) is found to exhibit a linear dependence on the field 
intensity. Each data point represents the mean value of three individual 
recordings under identical conditions; the error bars indicate the standard 
deviation. 
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incident field before the sample and transmitting the weak reference field 
Exee(0, t) = GE(0, t) through the medium under scrutiny (II). The difference 
between the output waveforms, AE(t) = E(€, t) — B-'Exe(@ t), directly 
yields the nonlinear polarization of the medium, Py;(t), see Supplementary 
Information section 1. The false-colour plot shows a typical attosecond 
streaking spectrogram of the transmitted waveform used in the experiments. 


remains low. Although no lasting negative phase shift indicative of 
residual conduction band population is observable at the trailing edge 
of the waveform (where the Kerr effect vanishes), an accurate deter- 
mination of the resultant Wirreversible and the related Wyeversible requires 
evaluation of Py, (t). 

The difference AE(t) = E(t) — B-'Eyer(t) yields Pyz(t) via 
Supplementary equation (13) (for details, see Supplementary 
Information section 1). Figure 3 depicts Pyi(z =€/2, t) along with 
E(z= ¢/2, t), both numerically propagated to the middle of the sample 
where their relative timing can be most precisely determined (see 
Supplementary Information section 1) for Epeak = 2.6 + 0.1 V AT, 
Pyx(t) oscillates almost perfectly in phase with E(t), indicating a dom- 
inant role of bound electrons. This is in strong contrast to the response 
of free electrons appearing in the ionization of neon atoms in the gas 
phase”®, exhibiting a 90° phase shift with respect to the driving field 
(Supplementary Information section 3). A closer inspection reveals 
that Py, (t) lags slightly behind E(¢) on the front edge and the peak of 
the pulse, indicating—according to equation (1)—energy transfer from 
the field to the electronic system of fused silica, both of which become 
of opposite sign on the trailing edge of the pulse. 

The response time of the polarizing electronic system, Tyesponse: CAN 
be evaluated from the central zero-crossing of the fields (Fig. 3, upper 
left panel) as Tyesponse © 80 as for Epeak= 2.6 + 0.1 V A71. This is smaller 
than estimates from the Bohr orbit time! and from .°) measurements 
in the range of 0.1-1 fs and decreases further with decreasing intensity, 
to well below 40 as for Epeak << 2.2 V A “|, as displayed in Fig. 3b. This can 
be understood by connecting T;esponse to the nonlinear (field-induced) 
absorption coefficient, Nz. For Tyesponse Much smaller than the laser 
period, equation (1) yields a simple linear relationship, ayy X Tyesponse: 
For multi-photon absorption, ayy, scales highly nonlinearly with 
the intensity and, according to this relationship, so does Tyesponse- 
Supplementary Information section 4 presents detailed modelling of 
the intensity scaling of Tresponse aS Well as a derivation of AL (Tresponse)- 

Rendering Py, (t) an experimental observable, attosecond polari- 
zation spectroscopy allows to explore the intricate dynamic exchange 
of energy during nonlinear light-matter interactions. Inserting the 
measured values of Pyz(t) and E(t) into equation (1) provides direct 
experimental access to the work W(t) done on the electrons by the 
laser field per unit volume, that is, the energy density transferred from 
the field to the electronic system. Figure 4a plots the measured W(t) 
for several different peak intensities and the results of time-dependent 
density functional theory (TD-DFT) modelling (see inset to Fig. 4, 
ref. 26 and Supplementary Information section 5). We find very good 
qualitative agreement between theory and experiment regarding all 
observables analysed, including the behaviour of the maximum phase 
shift, the change in refractive index and the evaluated amount of dis- 
sipated energy. Quantitative agreement is achieved only when the 
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Figure 3 | The nonlinear optical polarization response of silica at critical 
field strengths. a, The strong electric field numerically back-propagated to 
the centre of the fused silica sample (z= £/2 = 5 1m) is contrasted with the 
nonlinear polarization Py, (t) evaluated from E(t) and E;e(t) (Fig. 2a) at the 
same position (see Supplementary Information section 1). The response 
time of the nonlinear polarization near optical breakdown is found to be 
about 100 attoseconds at the pulse peak (close-up in the top inset). The other 
two insets display the computed spatial rearrangement of the electron 
density distribution for two extrema of the electric field at instants ft, and ty 
in false-colour representation (red indicates an increase and blue a decrease 
relative to the unperturbed state). Electrons located in the vicinity of the 
oxygen atoms appear to dominate the polarization response, whereas the 


theoretically employed peak electric field is adjusted to values approx- 
imately 20% larger. This discrepancy can be attributed to inaccuracies 
of the exchange-correlation potential used in the TD-DFT calculations. 

In all cases, the energy transferred from the field to the material 
increases up to the pulse peak and slightly beyond. This is because 
the field, while growing, needs to do ever more work to remove the 
electrons ever farther from their field-free location. The field ampli- 
tude decreasing after the pulse peak allows the displaced electrons to 
return gradually to their equilibrium position and radiate a part of 
the absorbed energy back into the driving laser field. This results in a 
negative slope for W(t). The positive and negative slope are connected 
to the phase lag and phase advance of Py, (t) with respect to its driving 
field E(t) before and after the pulse peak, respectively; these are clearly 
discernible in Fig. 3. 

The energy density Wirreversible = W(t — 00) irreversibly depos- 
ited in the system defines the charge carrier density promoted 
from the valence band into the conduction band according to 
Nearrier* Wirreversible/ Ag (assuming population of the lowest-energy 
states of the conduction band). For Epeax=2.6 vA-‘a residual 
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electron cloud around the silicon centres remains largely unaffected. b, The 
response time of the nonlinear polarization is evaluated near the pulse peak 
as a function of the peak intensity of the applied field (circles) and compared 
to the results of a perturbation theory calculation (solid line; for details see 
Supplementary Information section 4). The field-induced change in 
refractive index An is evaluated from Py, (t) at the pulse peak as a function 
of the applied peak intensity Jpeax. The nonlinear index nz determined from a 
linear regression (dashed line) is approximately one-third of the values 
acquired from time-integrated measurements using multi-cycle pulses”’. All 
data points and error bars represent the average and the standard deviation, 
respectively, of the evaluation of three individual data sets recorded under 
identical conditions. 


relative carrier concentration of Nearrier/ Nvp = 2.6 X 107‘ is found, 
where Nyp=1.4 x 1077cm~? is the density of the valence-band 
electrons. This small residual carrier concentration is pivotal for 
future ultrafast signal processing and can hardly be determined 
with similar sensitivity by any other experimental method. By anal- 
ogy, we can define the reversibly exchanged energy density as the 
difference between the maximum transferred energy, Wmax, and 
Wirreversible» Wreversible = Wmax — Wirreversible CaN be interpreted in 
terms of a virtual conduction-band population with a number of 
virtual carriers of Nyirtual¥ Wreversible/ Ag. Nvirtual is the result of a projec- 
tion of the laser-dressed and fully occupied valence-band states onto 
conduction-band states and hence it fully returns the energy density 
associated with it to the field upon its disappearance”’. In contrast, the 
real population, Nearrier Survives the field and—upon its subsequent 
decay—causes dissipation. 

Although N,jirtual seems rather elusive, attosecond metrology presents 
a very direct manifestation of the underlying reversible field-matter 
energy exchange. The initial energy flow dW/dt into the electronic 
system first extracts energy from the field on the leading edge of the 
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Figure 4 | Energy exchange between strong optical fields and electrons 
in real time. a, The amount of energy the few-cycle near-infrared laser 
field transfers into a unit volume of silica is obtained from the measured 
E(t) and Py, (t) via equation (1). W(t) shows signatures of a substantial 
transient virtual conduction-band population (which is proportional to 
Wmax — Wirreversible) oscillating in synchrony with the driving electric field. 
In the steepest of these oscillations, energy is transferred into the material 
within less than 650 as at Epeak= 2.7 V A7!. The amount of energy 
irreversibly dissipated in the sample Wirreversible depends critically on the 
maximum applied field strength E,eax. Shown are the results of recordings 
for three different field amplitudes with E,ea.= 2.5 V A7!,2.6VA-! and 
2.7V A“, as indicated, and a measurement closest to the average of five 
recordings with Eneax set equal to 2.1 VA"! (the uncertainty in the stated 
values of Epeak is + 0.1 VA~}). At this field strength, Wirreversible becomes 
immeasurably small (with its error exceeding its nominal value). In the 
inset, W(t) is computed from the nonlinear polarization at z= ¢/2 =5 1m 
obtained by the TD-DFT calculations outlined in Supplementary 
Information section 5 for a set of three different values of the peak electric 
field?®, as indicated. The spectrum of the computed nonlinear polarization 
shows the emergence of odd harmonics of the fundamental radiation (high 
harmonic generation). The results shown here are computed from the low- 
pass filtered nonlinear polarization to mimic the frequency transfer 
characteristics of the optical setup employed for the experiments. b, The 
squared electric field evolution of the reference wave and its envelope 
(black line) in comparison to the squared field and its envelope of the wave 
transmitted at a peak field strength of 2.5 V A~1, showing clear indications 
of energy redistribution and consequent reshaping of the pulse envelope 
caused by the nonlinear polarization (see text). c, The dissipated energy 
density, equal to the irreversibly transferred energy density shown in a, as a 
function of the relative refractive index change, as extracted from the 
results of the TD-DFT simulation (circles) and experimental data taken at 
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pulse. This is returned by a reversed energy flow on the trailing edge, 
resulting in a temporal shift of the pulse peak, AT peak 1 fs, as shown in 
Fig. 4b. The phenomenon is widely known as self-steepening or optical 
shock wave formation'. Our study reveals that this phenomenon is 
an inherent consequence of the reversible field-matter energy trans- 
fer accompanying the field-induced change in the phase of the pulse. 
Hence, a field-induced change in the group index, Ang, is inextricably 
linked to that of the refractive index, An, implying a group delay and 
a phase shift, respectively. 

Signal manipulation relies on the change of refractive index An (and 
Ang), which is characterized by Weversible: in contrast, dissipation is 
detrimental to signal manipulation and is determined by Wirreversible- 
Hence, the scaling of Wirreversible ANd Wreversible (or, equivalently, An) 
with the applied field strength is of key importance for future signal- 
processing applications. We evaluated the dissipated energy per unit 
volume and per several-femtosecond optical switching/modulation 
cycle versus An from our ab initio TD-DFT calculations, which 
we verified against measurement at the highest field strength, near 
optical breakdown (see Fig. 4c and the discussion in Supplementary 
Information section 6). At measurable levels of An, Wirreversible iN a 
silica optical switch can be some four orders of magnitude smaller than 
the heat dissipation of a state-of-the-art metal oxide semiconductor 
field-effect transistor (MOSFET) operating at up to 10 GHz in inte- 
grated circuits. This very much reduced dissipation per switching cycle 
should therefore allow the operation of a dielectric switch/modulator 
at 100 THz or beyond. 

An equally important discovery is the sub-femtosecond rise time of 
the transferred energy at Epeak > 2.5 V A”! within each optical cycle. 


Epeak = 2.6£0.1V Ao (square). The excellent agreement between theory 
and experiment verifies the simulation results, permitting reliable 
prediction of the relevant quantities for much lower field strengths. The 
dashed line marks the dissipated energy density of a state-of-the-art 
MOSFET; for details see Supplementary Information Fig. 16. All error bars 
indicate the standard deviation of the evaluation of three sets of recordings 
performed under identical conditions. 


With slightly shorter pulses than those used in these experiments”®, 
more than 90% of Wax will be transferred within a single sub- 
femtosecond rise depicted in Fig. 4. The resultant buildup of carriers 
in the conduction band within less than 1 fs will permit sampling of 
electric-field waveforms beyond the petahertz frontier in the simple 
setting demonstrated recently’®. 

Our proof-of-principle study on silica shows that careful choice of 
the peak electric field strength at E < E.,i, may open a route towards 
100-THz-rate signal processing. The observed sub-femtosecond 
gradient in nonlinear energy transfer and the related change in 
electronic/optical properties at E > Exit may pave the way towards 
sampling optical fields (from the infrared to the ultraviolet) with 
a compact, cost-effective solid-state device. A petahertz solid-state 
oscilloscope should enable signal processing and metrology at visible 
light frequencies. 

Traditional pump-probe spectroscopy makes use of the cycle- 
averaged amplitude envelope to resolve dynamics. In contrast, atto- 
second polarization spectroscopy uses the oscillating field as a probe, 
providing direct access to the full (linear and nonlinear) oscillating 
polarization and hence to the (reversible and irreversible) energy 
exchange between visible light and matter, as well as a delay in the 
system response. Hence, attosecond polarization spectroscopy is a gen- 
eralization of pump-probe spectroscopy, yielding complete informa- 
tion about the dynamic electronic response of matter to strong visible 
light fields with attosecond resolution and, thanks to the intense atto- 
second field gradients, with a signal-to-noise ratio orders of magnitude 
better than that of any other attosecond technique demonstrated so 
far. Implemented with a probe waveform of sufficiently broad spectral 
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coverage, the approach allows, in principle, complete retrieval of 
the nonlinear polarization and hence of the entire response of the 
electronic system to strong-field excitation. 
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Oil sands operations as a large source of secondary 


organic aerosols 
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Jeffrey R. Brook!, Gang Lu, Ralf M. Staebler!, Yuemei Han!, Travis W. Tokarek*, Hans D. Osthoff, Paul A. Makar!, Junhua Zhang, 


Desiree L. Plata® & Drew R. Gentner? 


Worldwide heavy oil and bitumen deposits amount to 9 trillion 
barrels of oil distributed in over 280 basins around the world!, with 
Canada home to oil sands deposits of 1.7 trillion barrels”. The global 
development of this resource and the increase in oil production from 
oil sands has caused environmental concerns over the presence of 
toxic compounds in nearby ecosystems* and acid deposition®*. The 
contribution of oil sands exploration to secondary organic aerosol 
formation, an important component of atmospheric particulate matter 
that affects air quality and climate’, remains poorly understood. 
Here we use data from airborne measurements over the Canadian oil 
sands, laboratory experiments and a box-model study to provide a 
quantitative assessment of the magnitude of secondary organic aerosol 
production from oil sands emissions. We find that the evaporation 
and atmospheric oxidation of low-volatility organic vapours from the 
mined oil sands material is directly responsible for the majority of the 
observed secondary organic aerosol mass. The resultant production 
rates of 45-84 tonnes per day make the oil sands one of the largest 
sources of anthropogenic secondary organic aerosols in North 
America. Heavy oil and bitumen account for over ten per cent of global 
oil production today’, and this figure continues to grow’. Our findings 
suggest that the production of the more viscous crude oils could be 
a large source of secondary organic aerosols in many production 
and refining regions worldwide, and that such production should be 
considered when assessing the environmental impacts of current and 
planned bitumen and heavy oil extraction projects globally. 

In general, secondary organic aerosol (SOA) mass is formed from 
the oxidation of organic gases, producing new compounds of suffi- 
ciently low saturation concentration (C*) that can nucleate or con- 
dense onto pre-existing particles. SOA typically dominates total organic 
aerosol (OA) mass, and can account for >50% of particulate matter 
mass below 2.5 1m (PM_;5) at many locations in the northern hemi- 
sphere’®. SOA is partially derived from the oxidation of routinely meas- 
ured volatile organic compounds (VOCs; C*¥ > 10° Lg m°). However, 
recent evidence!!!” suggests that semi-volatility compounds (SVOCs; 
C*=107'—10? ,,gm~*) and intermediate-volatility compounds 
(IVOCs; C* = 10°—10°1g m~?) are also important aerosol precursors 
owing to their high aerosol yields!’. While oil and gas production and 
processing, including oil sands (OS) production, are known sources 
of VOC emissions", their SVOC and IVOC emissions are unquan- 
tified. This is particularly relevant for the OS, since the mined mate- 
rial is a mixture of sand, water and clay coated in bitumen, the latter 
being an extremely viscous (and low-volatility) form of petroleum 
recovered through surface mining. During the Deepwater Horizon 
(DWH) oil spill, SVOCs and IVOCs were the predominant precursors 
of SOA formed downwind of the spill!®. Heavy oils and bitumen are 
comprised of lower-volatility hydrocarbons than DWH crude’, such 
that their extraction and processing might be expected to release a 


disproportionately large fraction of SVOCs and IVOCs into the atmos- 
phere compared to lighter crude oil. On average, 5.04 x 10°m? month™! 
of bitumen was produced from OS surface mining operations in 2013 
(ref. 17); should it be even slightly volatilized during production, there 
would be a strong potential for large amounts of SOA to be formed 
downwind of the region. This SOA formation potential from SVOC 
and IVOC emissions is demonstrated later. 

Three aircraft measurement flights (F1, F2, F3) were conducted in 
Lagrangian patterns (Extended Data Fig. 1 and Supplementary Table 1), 
in which the same plume from OS operations was repeatedly sampled 
along tracks perpendicular to the plume axis (see Methods). Each flight 
intercepted two large, well-mixed plumes, revealing rapid SOA for- 
mation during transport, as illustrated in Extended Data Fig. 2 for F1 
(similarly observed during F2 and F3). One plume was dominated by 
SO, and sulfate aerosols and the other by OA. While the sulfur plume 
can be traced back to OS facility stack emissions associated with des- 
ulfurization of raw bitumen, the origin of the large OA plume was less 
clear, and yet OA accounted for >80% of the aerosol mass (Extended 
Data Fig. 2). As the aircraft flew to different downwind distances from 
the OS (screens A, B, C and D), peak OA mass increased from ~10 
to 1444gm~? (A to B) and remained constant at ~12 1gm? (C to D), 
despite ongoing dilution (indicated by large decreases in SO4?~ and 
black carbon (BC) aerosol concentrations), plume broadening (39 
to 72 km) and particle deposition. This indicates a considerable SOA 
formation rate within these plumes, overriding the effect of dilu- 
tion. Using BC as a tracer to correct for these effects (as described in 
Supplementary Discussion), a sixfold relative increase in OA mass (as 
SOA) is observed over 4h (Fig. 1). 

Net SOA formation rates were derived on the basis of mass balance 
using the OA mass transfer rates (tonnes (t) h~!) across the flight 
screens'®, The SOA formation rate is the OA transfer rate difference 
between screens. A description of the SOA production rate calcula- 
tion, extrapolation assumptions and associated uncertainties is given 
in Methods. Accordingly, during F1, 3.4+0.9th7' of SOA was formed 
over ~90km (A to D; Fig. 2), 2.7+1.0t h~! between the screens of 
F2, and 2.140.9th~' during F3 (Extended Data Fig. 3). Including 
the SOA formed between the source region (S) and A, the cumulative 
SOA formation rates were 4.7 + 0.9, 5.3+1.0and4.3+0.9th~! during 
FI, F2 and F3, respectively. Scaling by the time-integrated OH radical 
concentration over daylight hours, these formation rates translate to 
45-84t day ' during the summer season. These remain underestimates 
since they do not include deposition or SOA formation beyond the last 
flight screens or at night. Correcting for depositional loss increases the 
rates to 55-101t day’. 

The rates of SOA formation observed here are very large; the relative 
rate of OA enhancement depicted in Fig. 1 is comparable to downwind 
of megacities such as Mexico City’? and Paris”, and is higher than that 


1Air Quality Research Division, Environment and Climate Change Canada, Toronto, Ontario M3H 574, Canada. Department of Chemistry, University of Calgary, Calgary, Alberta T2N 1N4, Canada. 
3Department of Chemical & Environmental Engineering, Yale University, New Haven, Connecticut 06520-8267, USA. 
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Figure 1 | Relative increase in OA downwind of the OS. The above- 
background (A)OA is normalized by BC (AOA/ABG; left axis) and 
shown as a function of photochemical age (—log(NO,/NO,); bottom axis) 
and air mass transport time (top axis). Increases in AOA/ABC indicate 
SOA formation. A sixfold relative increase in OA is observed (right axis), 
comparable to those reported downwind of large urban areas'?*”. Data 
points represent the average of the point-by-point AOA/ABC binned by 
photo-chemical age. Grey boxes and whiskers represent 10th, 25th, 75th 
and 90th percentiles of the data from all three flights (n = 2,573). 


observed in Tokyo”! and New England”, while the absolute rate (Fig. 2) 
is comparable to that estimated during the DWH oil spill (~3.3t hs 
ref. 15). However, a more compelling comparison to the absolute rate 
is with SOA formation rates downwind of major urban centres using 
available data (Fig. 2). For these urban centres, the SOA formed within 
one photochemical day was estimated using reported AOA/ACO ratios 
and daily CO emissions, assuming that CO is co-emitted with SOA 
precursors”? (see Supplementary Discussion). The SOA formation rates 
downwind of the Greater Toronto Area (Canada’s largest metropolis), 
Houston and the Mexico City Metropolitan area are estimated at 67, 
52 and 228t day (not accounting for deposition), respectively. Despite 
the noted uncertainties described in Supplementary Discussion, this 
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Figure 2 | OA mass screens during F1. SOA production is estimated as 
the sum of the differences in OA transfer rates between screens'®. The 
overall rate from the source region (S) is the integrated OA transfer rate 
through screen D (4.7t h~'). SOA formed within ~1 photochemical 
day for major North American metropolitan areas is shown in the table, 
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comparison illustrates that OS operations are one of the largest sources 
of anthropogenic SOA in North America. 

The SOA in these OS plumes had characteristics of two types of oxy- 
genated organic aerosols (OOA)” as represented by two factors derived 
from positive matrix factorization (PMF) analysis of aerosol mass spec- 
trometry data. Factor 1 (Extended Data Fig. 4) was more oxygenated 
than factor 2 (Fig. 3a), indicating that it was more photo-chemically 
aged. The time series of the factors during F1 are shown in Fig. 3b. 
Factor 1 was regionally distributed, dominating outside the plumes 
(>80%) at 3—51gm~?, and largely consisted of aged regional biogenic 
SOA, as its mass spectrum was highly similar to those reported over 
forests”° and from monoterpene oxidation in smog chamber experi- 
ments (Extended Data Fig. 4)*’. Factor 2 accounted for >90% of the 
SOA mass in the plume and was freshly formed from the oxidation of 
OS emissions. Its mass spectrum is almost identical to the spectra of OA 
derived from the OH oxidation of bitumen vapours in chamber exper- 
iments (1? > 0.96) (Fig. 3a and Extended Data Fig. 4), indicating that 
bitumen vapours are important precursors to the large SOA formation 
rates in OS plumes (see Supplementary Discussion). 

The contribution of oxidized bitumen vapours to the observed 
SOA depends strongly on the initial volatility of the SOA precursors’. 
To assess their SOA formation potentials, the volatility distributions 
(VDs) of bitumen vapours evolved from OS ore were determined (see 
Supplementary Methods), where the VD represents the fractions of total 
vapour in different ranges of C*. At 20°C, the majority of vapour evolved 
is in the C,4-Cy¢ hydrocarbon range (IVOC; C* = 10° 4gm ~°), and shifts 
only slightly at 60°C (Fig. 4a). While gaseous emissions exist that span 
the C).—Cjg range at ambient temperatures, heating of the material 
(70°C) results in complete evaporative loss up to Cis (Extended Data 
Fig. 5), leaving primarily compounds from Cj¢ to >C39. This represents 
a volatilization of <15% of the total extractable hydrocarbon mass from 
the ore at 50°C, increasing further at higher temperatures (Fig. 4b). In 
surface mining operations, ore material is obtained via open-pit mining 
followed by bitumen-sand separation using hot water (40-80°C) and 
further refining at up to 500°C. These derived bitumen vapour VDs 
clearly demonstrate the potential for atmospheric emissions of SOA 
precursors in a C* range associated with strong SOA formation!) 
On the basis of their volatility, such emissions are certain to occur dur- 
ing open-air mining and the various heated processing steps. Ambient 
ground-based measurements also show the existence of hydrocarbons 


Region Population and Estimated SOA 


area (millions, km?) (t day”) 
GTA 6.05, 7,120 67 
Greater 4.1, 5,580 52 
Houston area 
Mexico City 21.2, 7,730 228 
area 
Athabasca 0.061, 844 45-84 


oil sands 


compared to the range downwind of the OS (F1, F2, F3). Using AOA/ACO 
to derive SOA for cities has been estimated to carry —50% to +100% 
uncertainties”*. GTA, Greater Toronto Area. Map data: Google, image 
Landsat, Cnes/Spot Image 2015. 
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Figure 3 | PMF analysis for F1. a, PMF factor 2 profile during F1 
compared to the mass spectra of SOA from the oxidation of bitumen 
vapours in a smog chamber, demonstrating a high degree of similarity 
(7? = 0.96). Signal is normalized to the total aerosol mass spectrometry 
(AMS) signal. b, Factor time series during F1 for consecutive plume 
intercepts approximately 1h apart, at 600 m altitude. Factor 2 dominates 
the aerosol mass within the plume (red curve). 


in this volatility range in plumes from OS facilities (Extended Data Fig. 6 
and Supplementary Methods). 

The bitumen SVOC and IVOC conversion to SOA in the observed 
plumes was further assessed with a Lagrangian box model constrained 
by the airborne measurements (Fig. 4c). The model simulated the for- 
mation of SOA in the plume of F1 over 3h (screen A to D; Extended 
Data Fig. 2). Further details of the box model inputs and outputs are 
provided in Methods. From the ~70 p.p.b.v. of total VOCs measured 
at screen A, Fig. 4 demonstrates that only <6% of the SOA after 3h was 
contributed by the oxidation of speciated alkanes, alkenes and aromatic 
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Figure 4 | Modelling SOA formation during F1. a, Volatility distribution 
of bitumen vapours at 20°C and 60°C. b, Fraction of the OS that is non- 
volatile (grey) and the volatile fraction (purple). Error bars represent 
standard deviation (s.d.) of n =3 experiments. c, Box modelling of SOA 
formation during F1. A discrepancy between measured and modelled 

OA is reconciled by including 3.0-4.5 p.p.b.v. of bitumen IVOC vapours 
at time = 0h (blue arrows). Error bars represent s.d. of the measured OA 
(n=7). The pie chart indicates the contribution by each precursor type to 
the mass of SOA after 3h. 
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hydrocarbons, and <9% by isoprene and monoterpenes. The observed 
OA can only be reproduced by including bitumen SVOCs and [VOCs 
with the VD of Fig. 4a at 20°C; adding 3-4.5 p.p.b.v. of bitumen SVOCs 
and IVOCs (with the current SOA ageing scheme used) at screen A 
adequately simulated the SOA measurements after 3h (contributing 
~86% of the SOA; Fig. 4c). Hence, even though the required SVOC 
and IVOC concentrations may be small (3-4.5 p.p.b.v.) compared to 
~70 p.p.b.v. for VOCs, they dominate the contributions to SOA for- 
mation. Such a high SOA formation intensity is in contrast to most 
other types of energy production, which are likely to have emissions 
in a much lighter hydrocarbon range**””. 

The evidence here indicates that large amounts of SOA will form 
from this previously unrecognized pool of OS-emitted SVOCs and 
IVOCs, dominating over SOA from traditional VOC precursors. The 
potential air-quality impacts of these vapours as a result of transport 
and refining could be more widespread than anticipated. Indeed, recent 
evidence indicates that primary [VOCs from an unknown petroleum- 
based source can account for about 30% of SOA mass in urban/ 
suburban areas'”. This issue is not limited to Canada, as Venezuela 
plans to develop its Orinoco Oil Sands recoverable reserve of ~300 
billion barrels, and the USA—having an estimated 54 billion barrel 
reserve of bitumen—has begun surface mining in Utah. In light of the 
current trend for increasing heavy oil production relative to conven- 
tional crude, further investigation is required to fully understand the 
magnitude of this potential global issue. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Aircraft campaign. Airborne measurements of an extensive set of air pollut- 
ants over the Athabasca oil sands region in northern Alberta were conducted 
between 13 August and 7 September 2013 in support of the Joint Canada-Alberta 
Implementation Plan on Oil Sands Monitoring. Instrumentation was installed 
aboard the National Research Council of Canada Institute for Aerospace Research 
(NRC Aerospace) Convair-580 research aircraft. The aircraft flew 22 flights over 
the Athabasca oil sands, for a total of approximately 84h. Thirteen flights were 
designed specifically to quantify area emissions from various OS facilities by flying 
in a rectangular box shape, at multiple altitudes, resulting in 21 box flights around 
7 different OS facilities. 

A further three flights (denoted F1 (4 September), F2 (5 September) and 
F3 (19 August)) were designed to study the transformation of OS emitted pollutants, 
including the formation of SOA. These flights were designed as Lagrangian exper- 
iments in which the same air parcels in OS plumes were sampled at different time 
intervals (1h apart) as the air parcels were transported downwind for 4-5 h. The 
measurement locations for the flight tracks were chosen so that the aircraft would 
intercept the same air parcel, using real-time wind speed/direction measurements 
to guide the intercept locations. The intercepting flight tracks were perpendicular 
to the axis of the plumes, and the flight times crossing the plumes were 5-7 min. At 
each intercept location, high time resolution (1s for gases, 10s for AMS measure- 
ments) measurements were made at multiple altitudes (2-5 horizontal transects) 
from ~150 m above ground to over 1,400 m, which was higher than the mixed 
layer height, consisting of level flight tracks and spirals at the centre of the plume. 
These vertically spaced level flight tracks and spirals constituted virtual screens 
at the intercept locations. The three flights (F1, F2 and F3) comprised 5, 3 and 3 
screens, respectively. In between the screens in each flight, there were no industrial 
emissions. Thus, changes between screens can be described in terms of mixing/ 
dilution, chemistry and deposition that occurred from within a single air parcel. 

The first screens of the F1, F2 and F3 flights were approximately 1 h downwind 
of the majority of OS facilities, and at distances that pollutants from multiple OS 
sources were well mixed and merged into large plumes. The flight paths and their 
associated parameters are given in Extended Data Fig. 1 and Supplementary Table 1. 
As shown in this figure, the Lagrangian experiments resulted in varying degrees 
of success for a number of reasons, including data capture rates, consistency of 
winds, and the exact timing of when the aircraft crossed the plumes at the chosen 
intercepting locations, with F1 having the best matches between the air parcel 
transport times and the aircraft flight times at the screen locations. As a result, the 
data from F1 are used more extensively than others here, although not exclusively. 

The Convair-580 was equipped with fast response instrumentation to measure 
an extensive set of gas- and particle-phase pollutants, as well as standard meteor- 
ological and aircraft state parameters. A description of the meteorological varia- 
bles and aircraft state parameters measured is given elsewhere'®. Non-refractory 
(NR) particle composition (that is, ammonium, nitrate, sulfate and organics) was 
measured with an Aerodyne high-resolution time-of-flight aerosol mass spec- 
trometer (HR-ToF-AMS; Aerodyne Research)*”. Refractory black carbon (BC) 
particle measurements were made with a Single Particle Soot Photometer (SP2; 
Droplet Measurement Technologies)*"°". A subset of volatile organic compounds 
(VOCs) was measured with a high-resolution proton transfer time-of-flight mass 
spectrometer (PTR-ToF-MS; Ionicon Analytik GmbH)* and a more extensive 
set of hydrocarbons was measured via on-board canister sampling, followed by 
analysis by gas chromatography mass spectrometry and flame ionization detection 
(GC-MS and GC-FID). A full description of all the relevant gas- and particle-phase 
instrumentation aboard the aircraft is provided in the Supplementary Information. 
No statistical methods were used to predetermine sample size. 

OA mass transfer rate and OS SOA production rate calculations. The quan- 
tification of the mass transfer rate of organic aerosols (Roa, in th!) across a 
virtual screen uses an extension of the top-down emission rate retrieval algorithm 
(TERRA) described previously'®. TERRA was originally developed to determine 
emission rates from box flight patterns during this study'®, based on mass bal- 
ance within the virtual box constructed from the flight tracks. Briefly, TERRA 
uses the flight path around a facility at multiple altitudes to map the data to the 
two-dimensional virtual walls of a box surrounding the facility. The transport 
of a pollutant through the walls is calculated using aircraft wind and compound 
mixing ratio measurements, and emission rates calculated on the basis of the 
divergence theorem with estimations of box-top loss rates, horizontal and vertical 
advective and turbulent transport rates, surface deposition rate, and apparent loss 
rates due to air densification and chemical reaction rates. For the transformation 
flights, some components of TERRA were extended to apply to single screens 
created from vertically stacked level flight tracks and spirals. Concentration data C 
(injugm~*) are mapped to the screens and interpolated using a simple kriging function 
(on approximately 5,000-15,000 individual data points). Wind speed along the 
flight tracks was decomposed into two components based on the wind direction, 
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one parallel to the screen (up) and the other normal to the screen (u,), and the 
decomposed wind speeds were similarly mapped to the screen and interpolated 
using kriging. The lowest flight altitude was at approximately 150m, hence there 
was a need to extrapolate the OA measurements and the wind speed components 
downward to the ground surface. The downward extrapolation for the wind 
speed components assumed a stability-dependent log profile* vertically and uses 
nearby concurrent wind profiler data to determine the roughness and displace- 
ment height'*. The OA measurement downward extrapolation was based on the 
assumption of a well-mixed layer below the lowest flight track altitude, which is 
consistent with modelling® and the potential temperature profile. A variation to 
this downward extrapolation method assumed a linear downward trend from the 
flight altitudes, to capture possible variations in the mixing state below the lowest 
flight track altitude. Previous analysis has shown that unknown pollutant concen- 
trations below the lowest flight level (and the associated extrapolation to ground) 
led to the majority of the uncertainty in the emissions estimates from this approach 
(~20%; ref. 18). The OA measurements during the flights here were extrapolated 
downward using both methods; varying linearly to the ground or held constant 
(at the lowest altitude concentration) to the ground, to assess the uncertainty in 
the final derived mass transfer rate caused by the extrapolation methods. The OA 
data were further linearly extrapolated from the highest altitude level flight tracks 
upwards (to background OA concentrations) in the case where the level flight 
tracks did not traverse vertically beyond the mixed layer. The highest altitude 
extrapolated to was determined from the OA measurements and temperature pro- 
files from spirals along the tracks, which were flown above the top of the boundary 
layer but not included in the screens. The results showed a difference of <15% for 
the mass transfer rates among the different extrapolation schemes. 

The mass transfer rate of OA across each screen (Roa) of flights Fl, F2 and 
F3 was derived on the basis of the extended TERRA as described earlier and the 
HR-ToF-AMS data. To avoid the background OA affecting the computation of 
Roa; a background OA (Extended Data Fig. 7) was subtracted from the OA meas- 
urements in the following computation: 


S2 Z2 


RoalA)= ff C(s,z,A)Un(s, z,A)dsdz (1) 


S$, 21 


where s; and s» are the horizontal edge positions on the screen for the plume 
containing OA, z; is the ground surface altitude, z, is the top of the plume, C(s,z,A) 
is the interpolated/extrapolated concentration on screen A (and other screens), and 
U,(s,z,A) is the interpolated/extrapolated wind speed vector normal to screen A. 
The plume edges are determined by the OA concentration on the screen, indicated 
by C(s,z,A), approaching the background concentration of approximately 4jigm~°. 
Note that equation (1) describes horizontal advective transfer rates only; additional 
contribution from horizontal turbulent fluxes can contribute to Ro, but this has 
been shown to be a few orders of magnitude smaller than the horizontal advective 
transfer'® and therefore is ignored henceforth. 

Between screens, the mass transfer rate Ro, may change due to emissions with 
a rate of Eo, deposition with a rate of Dog, and the formation of SOA at a rate of 
Rgoa. In the original TERRA, vertical advective and turbulent transfer rates as well 
as air density changes were considered to achieve mass balance when the back- 
ground level of a compound was large’®. The vertical transport term was nominally 
small compared to the horizontal advection, and hence can be ignored. Thus, using 
a mass balance approach, the following relationship can be established 


Roa(tz) = Roa(h) + Rsoa + Eoa — Doa (2) 


where tf; and f, are the times of the two screens where the plume parcels were 
intercepted. Positive matrix factorization (PMEF) analysis of the HR-ToF-AMS data 
from the transformation flights F1, F2 and F3 showed no hydrocarbon-like aerosol 
factor*’, suggesting small-to-non-existent contributions from primary emissions 
of organic aerosols between the screens or from the source region to the screens. 
Hence Eo, = 0. Using concurrent refractory BC measured by SP2, the maximum 
dry deposition of BC over the region was estimated to be approximately 7% h”! 
derived from the differences in the BC mass transfer rates across the screens. We 
assume that this rate of deposition of BC is applicable to OA. Since deposition 
derived this way is relatively small, it is ignored to derive the SOA formation rate 
according to 


Rsoa © Roa(tz) — Roa(t) (3) 


Equation (3) was used to calculate the SOA formation rates, ignoring the dry depo- 
sition term, to be comparable to urban SOA estimates, which are net of deposition. 
Including a fully evaluated dry deposition for the Rsoq calculation would mean that 
equation (3) gives a lower limit of the true SOA formation rate during the measure- 
ment period. The total SOA production rate (Rsoa) in these flights is taken to be the 
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OA transfer rate (Roa) through the final screen, since Eo, = 0 and only oxygenated 
PMF factors were observed. The total SOA is then extrapolated to a photo-chemical 
day as described in Supplementary Discussion (Extended Data Fig. 8). 

Box modelling description. SOA formation in the large-scale plume of F1 was 
modelled with a zero-dimensional Lagrangian box model, as it evolved over 
approximately 3h (~600 m altitude). The simulation was constrained by the 
measurements of VOCs, NO,, OVOCs, O3 and other parameters, while dilution 
within the plume was accounted for using BC as a dilution tracer. Hydrocarbons 
of both anthropogenic and biogenic origin were constrained at the first screen (A), 
or throughout the simulation for those biogenic species with potential continuous 
emissions along the flight track (monoterpenes and isoprene). Background con- 
centrations were constrained by measurements outside of the plume. The model 
uses the Statewide Air Pollution Research Centre (SAPRCO07) chemical mechanism 
with updated isoprene chemistry****. The model was run with a 2 min time step 
and diluted chemical species at every time step. While the model had VOCs con- 
strained, including a constraint for NO, and O; resulted in very little difference 
between the model and observations. Hence, the gas-phase chemistry is well sim- 
ulated by the box model, as shown in Extended Data Fig. 9. Sesquiterpenes were 
constrained based on the ratio to measured monoterpenes. Sesquiterpenes were 
estimated from the PTR-ToF-MS measurements using an estimated ion trans- 
mission efficiency and proton transfer reaction kinetics, in a manner described 
previously*”°, resulting in a sesquiterpene:monoterpene ratio of ~0.39. This is 
somewhat higher than the ratios of 0.013 and 0.105 that have been recommended 
previously*®, and was used as an upper estimate to the sesquiterpene contribution 
to SOA. Regardless, biogenic VOCs contributed little to the observed and modelled 
SOA (Extended Data Fig. 10 and Supplementary Discussion). Recent evidence 
has also suggested that extremely low-volatility compounds (ELVOC) can also 
form via an auto-oxidation mechanism®. This process has been demonstrated to 
be most relevant in rural and remote regions where OA loading, VOC and NO, 
levels are very low, due to competing RO, + NO and/or RO; + RO; reactions. 
Previous data‘? indicate that ELVOC yields are most important at 1 p.p.b.v. NO, 
and below. While ELVOC may be an important SOA contributor outside of the 
OS plumes (where biogenics are abundant and NO, is low), the amount of NO, 
in the OS plumes studied (as well as the OA loading and VOC levels) were far too 
high (approaching >20 p.p.b.v. NO, and always greater than 1 p.p.b.v.) for ELVOC 
formation to be important. Hence, the contribution of ELVOC was not explicitly 
included in the box model analysis. 

Additionally, the model incorporated SOA formation from all known SOA 
precursors” treating SOA formation in two separate volatility basis sets (VBSs) 
(see supplementary Methods). Following a previously described method”, 
a four-bin VBS (C* = 1, 10, 100 and 1,000 jugm- +) treated SOA formation from 
traditional volatile organic compounds (VOCs), while a second nine-bin VBS 
(C* = 10° 7-10° jug m~*) treated SOA from SVOCs and IVOCs. The four-bin VBS 
was used for SOA from traditional VOCs including long-chain alkanes (ALK5 
in SAPRCO7), olefins (OLE1 and OLE2), aromatics (ARO1, ARO2, NAPTH and 
benzene), and biogenic compounds (ISOP, TERP and SESQ (isoprene, monoter- 
penes and sesquiterpenes)*** The nine-bin VBS treated ‘non-traditional’ SOA 
formed from the oxidation of off-road diesel as well as bitumen vapours having 
a volatility distribution as shown in Fig. 4a at 20°C. This volatility distribution was 
chosen to represent the emissions of these vapours at ambient temperature that 
would be expected for the first aircraft screen at ~600 m above ground, assuming 
that the open-pit mines are the largest contributor to emissions. A contribution by 
other processes at higher temperature is also possible. Total non-methane hydro- 
carbon (NMHC) mixing ratios in the plume were estimated based on the emission 
ratios of CO:NMHC from the heavy hauler diesel engines used in the Alberta OS 
facilities and the difference between CO in the plume and CO in the background 
(ACO). The emission ratios of SVOCs and IVOCs relative to total NHMC that 
were reported previously*’ for diesel engines were then applied to the total NMHC 
to give an estimate of the SVOCs and IVOCs in the plume. Pentadecane was used 
as a surrogate species for the SVOC and IVOC species from diesel emissions as 
suggested previously“, 

The model is configured in such a way that the initial reaction of a SOA pre- 
cursor with OH (or O3 in the case of ISOP, TERP, OLE1 and OLE2) leads to the 
formation of a number of less volatile gas-phase species. These less volatile gas- 
phase species are placed in volatility bins according to fitted chamber results”. 
The species in each of the bins are then allowed to partition between the gas and 
particle phase in accordance with their temperature-dependent partitioning 
coefficients”**°. To mimic aerosol ageing, the gas phase components in both the 
VOC SOA (V-SOA) and semi- and intermediate-volatility SOA (SI-SOA) VBS 
are aged as described previously”*. Specifically, traditional SOA in the V-SOA 
VBS is aged according to the Robinson et al. scheme*®, while SOA in the SI-SOA 
VBS is aged according to the more aggressive Grieshop scheme”. The Robinson 
scheme used to age V-SOA adds 7.5% more mass to the SOA during oxidation 


while moving the species to a volatility bin 10 times less volatile. The Grieshop 
scheme’’ that was used to age the SI-SOA adds 40% more mass per oxidation but 
shifts the species to a volatility bin 100 times less volatile. As the majority of the 
SOA formed in the V-SOA VBS is formed from anthropogenic precursors, V-SOA 
was aged at a rate of 1 x 107! cm? molecule™!s~! (refs 48, 49). The SOA in the 
SI-SOA VBS was aged using a faster rate of 2 x 10-'!cm* molecule~'s~! (ref. 24). 
The use of two separate ageing schemes for SOA formation is consistent with the 
expected differences between product distributions, molecular size and functional 
groups of different classes of precursor organic compounds. Such an approach has 
been used successfully on numerous occasions to match SOA observations (see 
Supplementary Methods). Further model runs were also performed to examine the 
sensitivity of the SOA formed from IVOCs to the oxidation scheme used (Extended 
Data Fig. 9 and Supplementary Methods). On the basis of these further model runs, 
the chosen base case conditions provide the best estimate of the SOA formation 
rate as it lies between the two upper and lower limits and is consistent with the 
scheme used in numerous regional air quality models that reasonably reproduce 
ambient forested and urban observations around the world. 

The model output was compared with organic aerosol observations. While the 
HR-ToF-AMS effectively measures PM; 9, the condensation of oxidized products 
will occur across the entire size distribution. Considerable coarse particle mass is 
observed during flight 1, probably originating from the large trucks during mining 
operations. Since the box-model output is a bulk SOA value (that is, size independ- 
ent), the AMS-derived OA mass is further increased using the measured surface 
area ratio of PM; 9 to PMzo, assuming that the condensation process is approx- 
imately proportional to surface area. This ratio, which ranged from ~1.3 to 1.1 
from screen A to screen D, was multiplied by the AMS-measured OA, increasing 
the total OA by 10-30% for comparison to the model output. 
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Extended Data Figure 1 | Flight tracks for the three transformation flights, Fl, F2 and F3. The approximate locations of the major OS plumes studied 
in this work are shown as the white shaded boxes. Map data: Google, image Landsat, 2015. 
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Extended Data Figure 2 | Measured organic and sulfate aerosol (orange) during the yellow highlighted section. Organics dominate the 
concentration during F1. Successive transects (labelled A, B, C and D) aerosol mass throughout the flight; note the change in magnitude between 
through the same major OS plumes at approximately 600 m altitude and the OA scale on the left and SO, scale on the right. Map data: Google, 
1h apart in transit time. Inset pie plots show the mean relative mass image Landsat, 2015. 


fraction for organics (green), sulfate (red), nitrate (blue) and ammonium 
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Extended Data Figure 3 | OA mass screens used to estimate SOA (that is,: 2.7+1.0t h~! and 2.140.9th7!). The overall formation rate 
production. a, b, OA mass screens for F2 (a) and F3 (b). The SOA from the OS source region (S) is the integrated OA transfer rate through 
production rate during these flights (~77 km and ~50 km between screen B (5.3+1.0th ' and 4.3+0.9th~'). Map data: Google, image 


screens) is the sum of the differences in OA transfer rates between screens Landsat, 2015. 
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b, PMF factor 1 from F1. A high degree of similarity is observed between 
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Extended Data Figure 5 | Bitumen volatility distributions. The volatility was solvent extracted from the sand without heating is shown in grey. 


distribution (mass fraction) based on carbon number are for OS that was Note the complete loss of hydrocarbons in the C;2—C;5 range upon heating 
thermally treated. Volatile hydrocarbons are trapped on polyurethane (denoted in yellow). Data are stacked upon each other for clarity. Error 
foam (PUF) tubes at 50-80 °C (red). The volatility of the remaining bars represent the s.d. of n=3 experiments. 


bitumen material is shown in green (50-80 °C) and that of bitumen which 
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Extended Data Figure 6 | Bitumen-related IVOCs in ambient ground- 
based data. a, Total ion chromatogram from ambient sampling in the OS 
when impacted by forest-influenced air (blue) and OS-operations air (red). 
The bitumen vapour headspace chromatogram is also shown (black), 
demonstrating that a large fraction of the gaseous mass in OS-impacted air 
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has volatilities (C};-C)¢ range) critical for SOA formation. b, Associated 
volatility distribution for OS-impacted air scaled by SOA yield’. c, One- 
hour back trajectory for OS-impacted sample using the hybrid single 
particle Lagrangian integrated trajectory model (HYSPLIT). d, One-hour 
HYSPLIT back trajectory for forest-influenced sample. 
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percentiles shown, n = 22,280), indicating that factor 2 (using a collection 
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Experimental determination of the electrical 
resistivity of iron at Earth’s core conditions 


Kenji Ohta!, Yasuhiro Kuwayama’, Kei Hirose**, Katsuya Shimizu? & Yasuo Ohishi® 


Earth continuously generates a dipole magnetic field in its 
convecting liquid outer core by a self-sustained dynamo action. 
Metallic iron is a dominant component of the outer core, so its 
electrical and thermal conductivity controls the dynamics and 
thermal evolution of Earth’s core’. However, in spite of extensive 
research, the transport properties of iron under core conditions 
are still controversial?~®. Since free electrons are a primary 
carrier of both electric current and heat, the electron scattering 
mechanism in iron under high pressure and temperature holds 
the key to understanding the transport properties of planetary 
cores. Here we measure the electrical resistivity (the reciprocal of 
electrical conductivity) of iron at the high temperatures (up to 4,500 
kelvin) and pressures (megabars) of Earth’s core in a laser-heated 
diamond-anvil cell. The value measured for the resistivity of iron 
is even lower than the value extrapolated from high-pressure, low- 
temperature data using the Bloch-Griineisen law, which considers 
only the electron-phonon scattering. This shows that the iron 
resistivity is strongly suppressed by the resistivity saturation effect 
at high temperatures. The low electrical resistivity of iron indicates 
the high thermal conductivity of Earth’s core, suggesting rapid core 
cooling and a young inner core less than 0.7 billion years old’®. 
Therefore, an abrupt increase in palaeomagnetic field intensity 
around 1.3 billion years ago!’ may not be related to the birth of 
the inner core. 

Extensive efforts have been made to measure the electrical resistivity 
of iron at high pressure since the earliest high-pressure mineral physics 
experiments!’, but its direct measurement under the conditions of 
Earth’s core is still challenging. Traditionally, the core resistivity was 
estimated to be 200-500 1.Q cm by a combination of static low-pressure, 
low-temperature measurements and shock compression data”. 
However, measurements under shock-wave compression potentially 
overestimate the resistivity owing to defect production during shock 
impact'’, Recent density functional theory calculations*®'*!> predicted 
the core resistivity to be 20%-50% of these conventional estimates. The 
static high-pressure, low-temperature experiments”” also suggested a 
low core resistivity because the resistivity saturates at high temperature. 
These values suggest that Earth’s core has been cooling rapidly, 
which implies a young inner core and high initial core and mantle 
temperatures'”!°, However, the resistivity saturation and the resulting 
low electrical resistivity of Earth's core have not been verified by exper- 
iments at the relevant high-pressure, high-temperature conditions. 

We made use of advanced experimental techniques, including the 
shaping of the sample-and-electrode composites, to measure the elec- 
trical resistivity p of iron at ultrahigh pressure and temperature in a 
laser-heated diamond-anvil cell (DAC) (Extended Data Fig. 1). We 
first examined the temperature response of the resistivity at 26GPa 
up to 2,610K (Fig. la). Kinks in the temperature-resistivity curve 
are associated with phase changes from « to y and from 7y to liquid; 
these phase changes were confirmed by concurrent synchrotron X-ray 


diffraction measurements (Extended Data Fig. 2). Our data at 26 GPa 
is in broad agreement with an earlier report at 15 GPa in a multi-anvil 
apparatus'”. We observed similar behaviour at 51 GPa up to 2,880K 
in a separate run (Fig. 1b). A resistivity jump of about +20% upon 
melting was observed in these two runs, although it may include some 
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Figure 1 | Change in electrical resistivity of iron upon phase transitions. 
a, Measurements from this study taken at 26 GPa up to 2,610 K (open 
circles) are compared with previous measurements in a multi-anvil press 
at 15 GPa (ref. 17) (crosses). b, Measurements from this study taken at 

51 GPa up to 2,880 K (grey circles). Error bars reflect the uncertainty in the 
high-pressure, high-temperature resistivity obtained (see Methods) and lo 
of the measured temperature variations. 
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Figure 2 | Change in the resistivity of ¢ iron with increasing temperature. 
a, Temperature dependence of the resistivity up to 450 K at 75-212 GPa 
fitted by the Bloch-Griineisen formula (coloured lines). Similar results 

at 65 GPa from ref. 7 are also shown. The resistivity value includes the 
uncertainty (not shown) derived from that in the reference high-pressure, 
room-temperature resistivity data’ (Extended Data Fig. 4), but the slope is 
obtained from the change in resistance, which was measured with very small 
errors (see Methods). b-f, Electrical resistivity measured at 80 GPa 


uncertainty related to the measurement of liquid iron, which is difficult 
owing to a change in sample geometry at melting. Previous experiments 
performed below 7 GPa reported that the resistivity of iron increased 
by 5%-9% upon melting!**°, while a recent theoretical study showed 
a 13%-20% resistivity difference between solid iron (the < phase) 
and liquid iron at 330 GPa (ref. 15). 

We carried out more experiments both in a muffle furnace up to 
450 K (Fig. 2a) and in a laser-heated DAC up to 4,500 K (Fig. 2b-f) 
at a higher pressure range in which only the ¢ phase was found 
(Extended Data Fig. 3). The former measurements indicated that 
the resistivity increased linearly with increasing temperature. Such a 
temperature-resistivity slope below 450 K is expressed by the Bloch- 
Griineisen law: 
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up to 1,820 K (b), at 106 GPa up to 2,540 K (c), at 115 GPa up to 4,490 K (d), 
at 140 GPa up to 2,490 K (e), and at 157 GPa up to 3,630 K (f). The resistivity 
measured at high pressure and high temperature in this study is lower than 
the prediction by the Bloch-Griineisen formula, pgg (dashed lines with grey 
uncertainty band) with parameters obtained from the high-pressure, low- 
temperature measurements in a. Such a low resistivity can be accounted for 
by the effect of resistivity saturation at high temperature (solid curves). 
Error bars are as in Fig. 1. 


Op(V) 
T ae zg" 
Peet) PM sm| J eee) =i seo 


(1) 


in which both the Debye temperature Op and the volume V are available 
from the literature”. n is an integer that depends upon the interaction 
of free electrons. The expression of the present data using equation (1) 
yields a material constant D(V) and n value at each pressure (Table 1). 
We found that the temperature T dependence became weaker with each 
pressure increment, corresponding to areduction in n from 5.5 to 1.5, 
which is reasonably consistent with previous results”*®. This suggests 
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Table 1 | Parameters for the Bloch-Griineisen formula 


P (GPa) @p (K) DI) n 
65" 610 92.6(1) 5.9 
75 626 79.6(5) 55 
100 677 33.3(4) 3.3 
170 783 8.1(1) 1.8 
190 809 5.6(1) 1.6 
212 836 4.5(0) 15 


Op was calculated from the equation of state of < iron?!. Fitting errors in n are smaller than the first 
decimal place. 
“From ref. 7. 


that increasing pressure diminishes impedance against free electron 
migration in iron. 

Using the D(V) and n values obtained from the high-pressure, 
low-temperature measurements, the Bloch-Griineisen formula 
predicts the resistivity of iron at high-pressure, high-temperature 
conditions. However, the present experiments, performed up to 4,500 K 
in the range 80-157 GPa, demonstrate that the measured resistivity is 
certainly lower than the value predicted by the Bloch-Griineisen law 
(ppc) even when we consider all possible error sources (see Methods). 
In principle, electron—electron scattering may be important at very high 
temperatures. However, we observe no sign of resistivity enhancement 
due to the electron-electron scattering with increasing temperature, 
at least up to 4,500 K (Fig. 2b-f). The low resistivity of ¢ iron observed 
in this study may be attributed to the well known” effect of resistivity 
saturation at high temperature’ (in which the resistivity increase is 
suppressed at high temperature). The electrical resistivity of metal 
asymptotically approaches the Ioffe-Regel value (that is, saturation 
resistivity, Psa.) when the mean-free path of free electrons becomes 
comparable to the interatomic distance”. Wiesmann et al.”> proposed 
an empirical description of the resistivity saturation in a simple shunt 
resistor model that can be applied to a variety of metals: 

ened 

PRG Prat 
The present temperature-resistivity data at each pressure is well 
explained by the shunt resistor model of equation (2) with a reasonable 
value of Psat (Fig. 2b-f). The saturation resistivity ,., should decrease 
under compression, because it depends on the interatomic spacing. 
Indeed, the fitting results show that psa diminishes from 142 pQcm 
at 80 GPa to 122 12cm at 157 GPa, although the uncertainties 
are somewhat large. These values are in good accordance with the 
value measured at 1 bar (168 1 cm) (ref. 24) (Extended Data Table 1). 
Thus, the present results up to 4,500 K demonstrate the effect of resis- 
tivity saturation, which defines the upper bound for the resistivity 
and leads to the low electrical resistivity (high thermal conductivity) 
of Earth’s core. 

The resistivity (pc+sat) Of € iron at 212 GPa is shown as a function 
of temperature and compared to previous estimates” in Fig. 3. The 
temperature-resistivity curve is calculated using the shunt resistor 
model (equation (2)), with D(V) =4.5 pQcm, n=1.5, and psa = 
6's yO cm, which were measured at 212 GPa in this study (Table 1) 
or estimated by linear extrapolation of the f,at versus (V/ Vo)! relation 
(Extended Data Table 1). Our estimate gives the lowest value because 
of the effect of resistivity saturation, which was not considered in earlier 
models of core conductivity”. The values of resistivity of liquid iron 
predicted by de Koker et al.° and Pozzo et al.° agree well with the 
present value for solid iron when we consider a ~20% increase upon 
melting (Fig. 3). 

The present data demonstrate the electrical resistivity of iron to be 
40.47§° p.Q.cm at 140 GPa and 3,750 K (Fig. 2e), close to the core-mantle 
boundary conditions. It corresponds to the electronic thermal conduc- 
tivity ka=226'7| Wm~!K~! when the Wiedemann-Franz relation 
(Kel =LoT/p, for the ideal Lorenz number Ly =2.44 x 10-°W OK”) is 
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Figure 3 | Iron resistivity at 212 GPa and high temperatures. 
Comparison of the present results of iron resistivity at 212 GPa (black solid 
curve with grey uncertainty band) with previous modelling? (triangles), 
shock compression study’? (circle), density functional theory calculations 
(square?, star®). 


applied. The Lorenz number for iron at core-mantle boundary condi- 
tions recently computed by theoretical studies*'* shows less than 
+3%/—6% difference from the ideal Lp value. 

Since Earth’s core contains some nickel and light elements in addition 
to iron, we consider the effect of such impurity elements. The impurity 
resistivity in iron has been measured for silicon”® and nickel? at 
high-pressure, room-temperature (300 K) conditions (see Methods). 
From Matthiessen’s rule, the resistivity of solid Feg7 sNijoSiz2.5, a possible 
outer core composition inferred from its density”®, is calculated to be 
86.9376 1Q.cm at 140 GPa and 3,750 K, considering the saturation 
effect. When the 20% resistivity increase upon melting is taken into 
account, we obtain 104758 y.Qcm and thus a thermal conductivity of 
8817? Wm! K7! for liquid Feg7 sNiyoSiz2s. 

Recent modelling of core thermal evolution'® using the value for 
thermal conductivity of solid Fe775Siz.,5 obtained by Gomi et al.” 
demonstrates that Earth’s core has been cooling quickly and that the 
inner core is less than 0.7 billion years old. Our estimate of core ther- 
mal conductivity at the core-mantle boundary coincides with the 
90W mK" reported by Gomi et al.’, although they assumed a higher 
saturation resistivity and did not consider the effect of melting. As a 
consequence, the present study supports a young age for the inner core. 

Biggin et al.'! found that the intensity and variability of the geomag- 
netic field were enhanced around 1.3 billion years ago and attributed 
this enhancement to the beginning of nucleation of the solid inner core. 
Such a remarkable change in palaeomagnetic field, however, may have 
been caused by another effect such as a change in spatial variation in 
the core—-mantle boundary heat flux’”. In addition, Earth’s magnetic 
field has been present over a long geological time (since 4.2 billion years 
ago”) and it has been assumed that the geomagnetic field was induced 
by thermal convection in the core before the onset of inner-core crystal- 
lization. The high thermal conductivity obtained in this study, however, 
suggests that this is not the case because otherwise the core must have 
been unrealistically hot in the early history of Earth'®. Alternatively, 
core convection in the absence of an inner core might have been driven 
or assisted by libration, precession and tides”. The precipitation of 
magnesium-bearing minerals from the core could also have provided 
an additional energy source to promote core convection®”. 

Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


High-pressure, high-temperature resistance measurements. We used symmetric- 
type DACs with 40-j1m, 90-j1m, 120-j1m, 150-j1m, and 300-11m culet diamond 
anvils to generate high pressures. The sample was iron foil (99.99% purity), the 
same as that used in Gomi et al.’. The iron foil was shaped into a single member 
with four probes by using a focused ion beam apparatus (JEOL JIB-4000 and FEI 
Versa 3D) (Extended Data Fig. 1a). This shaping technique enables us to prepare 
samples with uniform geometry corresponding to each anvil culet size. The pres- 
sure medium was fine-grained SiO, glass, KCl or Al,Os, which also acted as a 
thermal insulator during laser heating. They were loaded into a sample chamber 
at the centre of an insulated gasket consisting of rhenium and cubic boron nitride 
+ epoxy powder. Four electrical leads made of platinum were connected to each 
iron lead outside the sample chamber (Extended Data Fig. 1b). Electrical resistance 
of an iron sample was measured using the four-terminal method to eliminate the 
large errors associated with lead resistance, using a Multimeter (Keithley 2000) or 
a SourceMeter (Keithley 2450) under a constant direct current of 100 mA. 

Heating experiments were conducted in a muffle furnace up to 450 K (Fig. 2a). 
Sample temperature was monitored by a thermocouple. We obtained pressure 
at room temperature based on the Raman spectrum of a diamond anvil*!. We 
also measured the temperature response of the electrical resistance of iron at high 
pressure in a laser-heated DAC (Figs 1 and 2b-f). The sample was heated with a 
couple of 100-W single-mode Yb fibre lasers using a double-side heating system at 
BL10XU, SPring-8. The laser-heated spot was 40 |1m across, which was larger than 
the distance between two potential leads (Extended Data Fig. 1c). Temperature was 
obtained by a spectroradiometric method’. The variations in temperature within 
an area of resistance measurement were less than +10%. Since temperature het- 
erogeneity in a heated area of the iron sample generates thermoelectric power, 
we measured the voltages of iron samples with passing direct current from both 
current directions (I; to I_ and I_ to I.) and averaged these two voltage values to 
eliminate the effect of thermoelectric power (Extended Data Fig. 1b, c). 

Concurrently with all high-pressure, high-temperature resistance measure- 

ments, we performed synchrotron X-ray diffraction (XRD) measurements at 
BLI10XU, SPring-8 (Extended Data Fig. 2). Pressure was calculated from the unit- 
cell volumes of ¢ and 7 iron and their pressure-volume-temperature equations 
of state’!33 (Extended Data Fig. 3). The pressures given in Figs 1 and 2 are those 
at 300 K. They increased slightly with increasing temperature, and the pressure 
increase itself diminished the resistivity, but it was corrected to examine the effect 
of temperature on iron resistivity at a given pressure (see below). 
Estimation of resistivity. The electrical resistivity of iron at high pressure and 
high temperature was calculated from the ratio of resistance measured at high 
pressure and high temperature to that measured at high pressure and room tem- 
perature multiplied by the high-pressure, room-temperature resistivity of ¢ iron 
previously obtained by Gomi et al.”. Extended Data Fig. 4 compiles previously 
reported high-pressure, room-temperature resistivity values for ¢ iron*”**4, Some 
difference is found at relatively low pressures, but all of the experimental and theo- 
retical determinations are consistent with each other above ~80 GPa. The present 
XRD analysis showed that sample pressure increased on laser heating in a DAC. 
The resistivity measured at high pressure and high temperature was corrected 
for the effect of such a pressure increase. To examine the temperature response 
of iron resistivity at constant pressure, we first estimated the relative reduction in 
iron resistivity with a certain pressure increment based on high-pressure, room- 
temperature resistivity data” and then corrected the measured high-pressure, 
high-temperature resistivity value with that rate of reduction. 

From the high-pressure, low-temperature (up to 450 K) data, we obtained the 
slope of the temperature-resistivity relation in a wide pressure range (65-212 GPa) 
and converted them into D(V) and n parameters in the Bloch-Griineisen 
formula, which can be compared with previous estimates*”*. We assumed D(V) 
and n to be temperature-independent. The saturation resistivity was calculated up 
to 157 GPa on the basis of the measured high-pressure, high-temperature resistivity 
data and the calculated Bloch-Griineisen values, taking all possible error sources 
into account (Extended Data Table 1). Using these Bloch-Griineisen values and 
the saturation resistivity at each pressure, we calculated the resistivities of ¢ iron 
up to >6,000 K at 212 GPa (Fig. 3) and of Fe¢7. sNijoSiz2.5 to 3,750 K at 140 GPa on 
the basis of a shunt resistor model (see equation (2)). 

The uncertainty in the present high-pressure, high-temperature electrical resis- 
tivity measurement is derived from the following uncertainties: (1) in the measured 
resistance (only 0.012% error when using Keithley 2000 and 2450); (2) caused 
by volume expansion upon heating; and (3) in the reference high-pressure, 
room-temperature resistivity of ¢ iron determined by Gomi et al.’. We confirmed 
in each experiment that the geometry of the iron sample did not change before and 
after laser heating under optical microscope (Extended Data Fig. 1b). Indeed, iron 
resistance at 300 K before heating was identical to that after heating. XRD patterns 
collected before and after the high-pressure, high-temperature resistivity 
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measurements indicate that no chemical reaction occurred during laser heating 
(Extended Data Fig. 2b). The volume expansion observed by XRD data taken at 
high pressure and high temperature caused an underestimation of resistivity by 
<0.3%. The error in the reference high-pressure, room-temperature data is the 
main source of uncertainty; the resistivity of iron is 3.9'9'$ cm at 140 GPa and 
300 K (Extended Data Fig. 4). The error bars for the present high-pressure, 
high-temperature resistivity data in Figs 1 and 2 include all of these uncertainties. 
Temperature distribution within a laser-heated sample. We simulated a tempera- 
ture distribution within a laser-heated sample under high-pressure, high-temperature 
conditions (Extended Data Fig. 5). A steady-state heat conduction equation was 
employed: 


V(K(T)VT(xy,z)) + A(x y,Z) =0 (3) 


where #(T) is thermal conductivity and A(x, y, z) is the energy flux from a laser, 
considering the heat balance at the sample surface. 

Here we calculated the temperature distribution in an iron sample sandwiched 
by thermal insulation layers of Al,O3 at 115 GPa and 4,500 K, the conditions being 
the same as that for resistance measurement (Fig. 2d). The width, length (separa- 
tion between two potential leads; V;. and V_), and thickness of the iron sample 
were 10j1m, 2|m and 1 um, respectively (the thickness was measured with a micro- 
probe after decompression to 1 bar) (Extended Data Fig. 1b). Other dimensions 
were: thickness of Al,O3 layers, 2|1m; diameter of a sample chamber, 401m; culet 
and height of diamond anvils, 120 j1m and 2 mm, respectively; laser beam size, 
30 um at full width of half maximum. We used the value for the temperature- 
dependent thermal conductivity of iron at 115 GPa given in Fig. 2d. The low ther- 
mal conductivity of iron, one-third of the present value and of a similar estimate 
by ref. 2, was also considered. The conductivities of the surrounding materials that 
we used were: single-crystal diamond anvil, 500 Wm! K~! (ref. 35); ALO; thermal 
insulator, 2.5 W m7! K7! (ref. 36) (for the very fine powdered Al,O3 used in the 
present experiments, the thermal resistance at the grain boundary is very large, 
making the bulk thermal conductivity much lower than that of a single crystal). 
Since there is no report for the thermal conductivity of the cubic boron nitride 
+ epoxy mixture, we assumed it to be similar to the value of 60 Wm !K"! for 
polycrystalline hexagonal boron nitride at 1 bar (ref. 37). Here we did not consider 
pressure and temperature effects on the thermal conductivity of these surrounding 
materials. However, the thermal conductivity of these insulators increases with 
increasing pressure, while it decreases with increasing temperature, and these 
effects are cancelled out at high-pressure, high-temperature conditions**””. 

With 4,500 K as a peak temperature at 115 GPa, our simulations show that the 

maximum temperature difference in the iron sample within an area for resistance 
measurement is about 200 K (Extended Data Fig. 5), smaller than the uncertainty 
in temperature determination. Even when we use low thermal conductivity for 
iron, the temperature difference is around 300 K. Such a temperature difference 
of 200-300 K changes the resistivity of iron by only ~3% at 115 GPa at around 
4,000 K. These heat conduction calculations indicate that the observed low iron 
resistivity was not derived from a strong temperature gradient within a sample but 
is the intrinsic nature of the electrical property of iron. 
Impurity effect on electrical resistivity of iron. Earth’s core consists not only of 
iron but also of nickel and light element(s) such as silicon, oxygen, sulphur, carbon 
and hydrogen. These impurity elements in the core can be considered as additional 
scatterers for free electrons in metal. The effect of dilute impurity elements can be 
expressed by Matthiessen’s rule: 


P Fe-alloy( V, T) = Poure-el V, T) a »~ Pik "Xk (4) 
k 


where /Fe-alloy Ppure-Fe Pi, and Xx are the electrical resistivities of iron alloy and 
pure iron, the impurity resistivity of element k, and the concentration of element 
k in iron, respectively. Matthiessen’s rule indicates that element k contributes 
proportionally to its concentration x; independently of temperature. Note that this 
rule does not consider the effect of resistivity saturation, although the measured 
electrical resistivity of the Fe-Si alloy does show saturation phenomena‘). The 
high-temperature electrical resistivity, taking the resistivity saturation into account, 
is well described by a shunt resistor model (see equation (2) and ref. 23). 

The impurity resistivity of Si in iron at high pressure has been examined using 
high-pressure experiments”*. According to Gomi et al.’, the impurity resistivity of 
Si psi(V), in units of microOhm centimetres per atomic per cent, was formulated as: 


ps(V) = Fi [F ry (5) 


where V is the volume of ¢ iron at high pressure and Vo is the volume of ¢ iron at 
1 bar, respectively”!. The fitting parameters in equation (5) were determined to be 
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F,=3.777 (2g WO. cm/at%, F,=1.48"9'53, and F;=—3 9°55. In addition, Gomi and 
Hirose’ reported the volume-dependent impurity resistivity of Ni in ¢ iron at high 
pressures, pyi(V), in units of microOhm centimetres per atomic per cent: 
F 
Vv 6 
Pyi(V) = Fa [F = ‘| 
Vo 


(6) 


where Fy=7.25" 19g X 10° cm/at%, Fs5=3.51*9 59 and Fs=—8.061 755. 

At 140 GPa, the impurity resistivities of Si and Ni are calculated to be 
9.6075 25 2 cm/at% and 1.97944 1.2. cm/at%, respectively, according to equations 
(5) and (6). Taking into account the resistivity saturation effect, we calculated the 
electrical resistivity of solid Feg7 sNijoSiz2.5, the composition accounting for the 
outer core density”®, at 140 GPa. Since the saturation resistivity of solid 
Feg7.sNijoSiz2,5 at 140 GPa is not known, we assume it to be 1237? LQ cm, the same 
as that for pure iron as determined in this study. 
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Extended Data Figure 1 | Images of iron sample and sample 
configuration for electrical resistance measurements in a DAC. 

a, A composite of an iron sample and electrodes shaped by a focused ion 
beam. b, c, Photomicrographs of a sample chamber viewed through a 
diamond anvil at 115 GPa and 300 K (b) and 3,700 K (c). The four-probe 
method was used for electrical resistance measurements. At each set of 


40 ym 


pressure and temperature conditions, we measured the voltage difference 
between two potential leads (V and V_) twice, when electric current 
passed through the sample from a positive current (I,,) lead to a negative 
current (I_) lead and in the opposite direction. These two voltage values 
were averaged to eliminate thermal voltage, and the resistance is calculated 
from Ohms law. 
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Extended Data Figure 2 | XRD patterns of iron samples at high 
pressures and temperatures. a, Data collected at 26 GPa (see Fig. 1a for 
resistivity measurement), showing the diffraction peaks of < Fe, 7 Fe, and 
Al,O3 pressure medium (Cor.). Liquid Fe is indicated by a diffuse signal at 
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20=10° to 14°. b, e Fe at 140 GPa (Fig. 2e). Part of the SiO. glass pressure 
medium crystallized into a CaCl,-type phase (labelled as CS) when 
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Extended Data Figure 3 | Pressure and temperature conditions of those for measurements in a muffle furnace up to 450 K (Fig. 2a). Phase 
electrical resistivity measurements of iron. Circles show the conditions boundaries are from the literature*”“?. The symbol colours in this figure 
of laser-heated DAC experiments (Figs 1 and 2b-f), and squares indicate correspond to those in Figs 1 and 2. Error bars, 1a. 
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Extended Data Figure 4 | Electrical resistivity of ¢ iron at high pressure theoretical calculations‘. All these results are consistent with each other 


and 300 K. Bold black and blue curves show the results of previous above ~80 GPa. For comparison, the resistivity of ¢ iron at 1 bar deduced 
DAC experiments by Gomi et al.’ and Seagle et al.®, respectively, with from the measurements of hexagonal close-packed (hcp) Fe-Os alloy“ 
uncertainties shown by bands. The red curve connecting crosses is from and at low pressures in a DAC** are also shown. 
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Extended Data Figure 5 | Temperature maps of the iron sample and of the cross-section of a sample chamber and a gasket. The maximum 
electrodes in a laser-heated DAC at 115 GPa and 4,500 K. Al,O; is the temperature difference in the area for resistance measurement is 200 K, 
pressure medium. The top panel shows the temperature map viewed smaller than the uncertainty in the temperature determination. 


along the compression axis. The bottom panel shows the temperature map 
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Extended Data Table 1 | Saturation resistivity p,q of < iron at high pressures 


P (GPa)* (V/Vo)'? Psat (uQcm) Reference 
0 1.000 168 ref. 247 
80 0.919 142(+32/-26) 
106 0.904 136(+53/-27) 
114 0.901 124(+35/-26) This study 
140 0.890 123(+42/-28) 
157 0.883 122(+23/-29) 


*Calculated from the equation of state of < iron?!. 
tRef. 24 determined the psat of iron to be 168 p2 cm. 
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Direct measurement of thermal conductivity in 
solid iron at planetary core conditions 


Zuzana Kondpkova't, R. Stewart McWilliams’, Natalia Gémez- Pérez”? & Alexander F. Goncharov** 


The conduction of heat through minerals and melts at extreme 
pressures and temperatures is of central importance to the evolution 
and dynamics of planets. In the cooling Earth’s core, the thermal 
conductivity of iron alloys defines the adiabatic heat flux and 
therefore the thermal and compositional energy available to support 
the production of Earth’s magnetic field via dynamo action’. 
Attempts to describe thermal transport in Earth’s core have been 
problematic, with predictions of high thermal conductivity*’ at 
odds with traditional geophysical models and direct evidence for 
a primordial magnetic field in the rock record®""”. Measurements 
of core heat transport are needed to resolve this difference. Here we 
present direct measurements of the thermal conductivity of solid 
iron at pressure and temperature conditions relevant to the cores 
of Mercury-sized to Earth-sized planets, using a dynamically laser- 
heated diamond-anvil cell'’!*. Our measurements place the thermal 
conductivity of Earth’s core near the low end of previous estimates, 
at 18-44 watts per metre per kelvin. The result is in agreement with 
palaeomagnetic measurements” indicating that Earth’s geodynamo 
has persisted since the beginning of Earth’s history, and allows for a 
solid inner core as old as the dynamo. 

The thermal evolution of Earth’s core and the energetics of the geo- 
magnetic field are highly sensitive**” to the thermal conductivity of core 
materials at the high pressures (P) and high temperatures (T) of the core. 
A wide range of values for the thermal conductivity of iron (Fe) and its 
alloys at core conditions have been predicted using materials theory*°”° 
and high-pressure measurements of electrical conductivity®'*"'*. To 
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predict thermal conductivity, the Wiedemann-Franz-Lorenz law: 


k=LTo (1) 
has almost universally been employed, where k and o are the thermal 
and electrical conductivities and L is the Lorenz number. The Lorenz 
number—traditionally an empirically determined quantity'’—has been 
calculated theoretically®” but not measured for Fe or its alloys at high 
pressure and temperature conditions. 

For low estimates of thermal conductivity’, near k=30W m7 'K™!, the 
geodynamo may be sustained during the whole life of the planet, and con- 
vection of the core is readily attained in thermal (in absence of an inner 
core) or thermochemical scenarios’. On the other hand, a recent estimate® 
near k=130Wm_!K~! implies a young inner core (that is, less than 
1.3 billion years old), and only thermal convection driving the dynamo 
at earlier times’. However, a paradox arises* when evidence of an ancient 
magnetic field*!° must be reconciled with the high energy fluxes needed 
to drive thermal convection in a high conductivity, fully fluid core. The 
large core-mantle boundary heat flux (Qcmp) and high internal tempera- 
tures for the early Earth in this case (implying a molten lower mantle and 
possibly a stably stratified core) are difficult to explain given current man- 
tle evolution models and low present-day Qcms (ref. 3). Re-evaluating the 
history and energy balances of Earth’s core and mantle in this context, it is 
necessary to have certainty on the validity of reported values of k (ref. 8). 
Thus, there is a pressing need for direct thermal conductivity measure- 
ments of core materials at conditions relevant to Earth’s core. 


Figure 1 | Temperature of Fe foils during 

flash heating at high initial temperature and 
pressure. a, b, Plots of the measured temperature 
histories (grey) on the pulsed and opposite sides 
of the foil together with finite-element models 
(red) for best-fit thermal conductivity k of Fe, at 
two pressures: a, P= 48 GPa; b, P= 130 GPa. 

c, Instantaneous temperature map of the 
modelled sample area at initiation of flash 
heating at 112 GPa, as a function of radial (r) and 
axial (z) position. Contour lines are isotherms. 
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Although the technical capability of reaching planetary core condi- 
tions in the laboratory has long been available using the laser-heated 
diamond-anvil cell (DAC), measurements sensitive to transport prop- 
erties have been scarce. Thermal transport measurements have been 
especially challenging. To overcome this limitation, we dynamically 
measured temperature in the laser-heated DAC!” to study the prop- 
agation of heat pulses across Fe foils contained at high initial pressure 
(35-130 GPa) and temperature (1,600-3,000 K) (Fig. 1). Fitting of the 
temporally and spatially resolved temperature fluctuations with heat 
conduction models provides a strong constraint on the thermal trans- 
port (Methods and Extended Data Figs 2-6). 

The experiments performed below ~50 GPa probe Fe in the stability 
field of face-centred cubic + Fe (Fig. 2)'*-*?. At conditions close to 
those at the centre of Mercury's core”? (~40 GPa and 2,200-2,500K), 
thermal conductivity is 35 + 10 Wm~!K~!. This is similar to the ambi- 
ent pressure values in + Fe (k =30 +3 W m~'!K7')™, suggesting that k 
is not strongly dependent on pressure at Mercury's core conditions. 
This result is similar to earlier expectations for the thermal conduc- 
tivity of Mercury’s core” of ~40 W m7!K~', but is at odds with more 
recent estimates”!. At pressures in the range 50-80 GPa, the sample 
is usually pre-heated in the hexagonal close-packed ¢ Fe phase but 
may undergo partial transformation to the + phase during the ther- 
mal pulse. Thermal conductivity values found at these conditions are 
considered biased towards the € phase, and are in general agreement 
with earlier DAC measurements on ¢ Fe (ref. 26). The highest-pressure 
data, 88-130 GPa at 1,600-3,500K, are unambiguously in the region of 
Fe and are closest to the conditions at Earth's core-mantle boundary": 
136 GPa and 3,800-4,800K. A large number of measurements (>20) at 
112GPa show k to decrease with temperature at these conditions (Fig. 3), 
as expected from combining electrical conductivity data under static 
and shock wave compression!*. 

To model the temperature dependence of thermal conductivity in ¢ 
Fe, we fitted the data at 112 GPa to: 


b 
k=aT+ —~ 

al: (2) 
This form ensures a realistic behaviour of both thermal conduc- 
tivity and electrical resistivity (1/c) that is consistent with previous 
high-temperature resistivity data*'*' (see Methods and Extended 
Data Fig. 1). The model fit at 112 GPa (Fig. 3) also includes resistiv- 
ity data at room temperature™!* extrapolated to 112 GPa and shock 
wave resistivity data’ interpolated to 112 GPa. These data were con- 
verted to thermal conductivity using an empirical Lorenz number of 
(1.90.4) x 10°-°WQK ? (see Methods). The fit of equation (2) yields 
b~1,972Wm7!K~! and a0. The error in model thermal conduc- 
tivities is ~20% (one standard deviation). 

To assess the pressure variation of k in ¢ Fe, we used a physical model 
for the variation of electronic thermal conductivity with pressure (see 
Methods) in terms of isothermal bulk modulus (Ky) and Griineisen 
parameter (7): 
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The Griineisen parameter and bulk modulus at core conditions are 
evaluated using the thermal equation of state of Fe (ref. 27) (see 
Methods). The model represents our data well to 130 GPa (Fig. 2), and 
predicts somewhat larger values of k at Earth’s outer core conditions 
(Fig. 3). Accounting for the uncertainty in outer core temperature!, 
k for pure Fe varies from 33 +7 Wm !K7 at core-mantle boundary 
conditions (T= 3,800-4,800 K, P= 136 GPa) to 46+9Wm7!K~! at 
inner-core boundary conditions (T' = 5,600-6,500 K, P= 330 GPa). 
The conductivity of molten Fe, which is relevant to the outer core, 
is generally taken to be similar to that of solid Fe near melting!??!**. 
The addition of light-element impurities is expected to reduce con- 
ductivity by 10%-40% (refs 7 and 13). Thus, the thermal conductivity 
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Figure 2 | Thermal conductivity of Fe at high pressure and temperature. 
a, Phase diagram!**°” of Fe with conditions of the thermal conductivity 
measurements (orange) falling in the domain of the y and € phases. The 
shaded areas depict conditions of Earth’s core’ and Mercury's core”’, 

with the vertical dashed line marking the pressure at Earth’s core-mantle 
boundary (CMB). b, Thermal conductivity results from this study are 
shown as solid symbols: in the domain of + Fe (upward triangles) the 
-+,and phases most probably co-exist'®; for samples typically pre-heated to 
below the \-< boundary and then crossing it briefly during thermal pulses 
(diamonds), the phase is considered to be mostly < Fe; at higher pressure 
(downward triangles) samples are pure ¢ Fe at all conditions!® 

(see Methods). Prior direct thermal conductivity measurements on the 

+ phase”* and the < phase”® are shown as open symbols. The dashed lines 
are linear fits to the results from the y and « domains, whereas solid 

lines are model values (see equations (2) and (3)). Error bars include 
uncertainty (one standard deviation) and range of measurements. 


180 


for Earth’s liquid outer core is between 25 +7 Wm !K!at the core- 
mantle boundary and 35 + 10 W m7! K~! at the inner-core boundary. 
Refining estimates for liquid core composition can further reduce this 
uncertainty. The corresponding electrical resistivity of the outer core 
is3.7+1.5pQm. 

Our thermal conductivities for pure Fe at core conditions compare 
well with predictions based on resistivity measurements at high pres- 
sure’ including shock wave results (52 +11 Wm! K7') or Stacey’s 
law of constant resistivity at melting? (48 + 10 W m~'K™'), where the 
empirical value of L has been applied. Such predictions are sensitive to 
the assumptions used, however, and much larger values are found using 
slightly different approaches*’?4, emphasizing the need for direct con- 
straints from high-pressure, high-temperature data. Calculations®” 
finding k= 120-160 Wm“! K~! at core-mantle boundary conditions 
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Figure 3 | Thermal conductivity of Fe versus temperature. Solid circles 
indicate results from this study at 112 GPa, with horizontal bars indicating 
the range of temperatures observed in each experiment, and vertical bars 
the uncertainty in k (one standard deviation). Estimates based on prior 
electrical resistivity measurements”'*!> are shown as open symbols, with 
bars indicating uncertainty from the empirical determination of L. 

The thermal conductivity model for 112 GPa, 136 GPa (core-mantle 
boundary), and 330 GPa (inner-core boundary) are blue, green and 

red lines, respectively (see equations (2) and (3)). For comparison, the 
prediction of ref. 2 for core alloy at outer-core conditions is the grey line. 


and k= 205-250 Wm ~'K~' at inner-core boundary conditions are 
5.6 + 1.8 and 6.5 + 1.7 times larger than our values, respectively. 

During an early stage of Earth history before the formation of the 
inner core, the presence of the geodynamo requires a core-mantle 
boundary heat flux (Qcmp) greater than the conductive heat flux 
in the core. The heat flux requirements for such a convective early 
core are moderate for the values of k found in this study, similar to 
that of ref. 9: Qcyyp must exceed a threshold of 3.8 + 1.6 TW (for k of 
31£13Wm7!K~!) for Earth’s magnetic field to be sustained, assum- 
ing negligible radiogenic heating. Later in the planet’s history, after a 
solid inner core has formed, the core-mantle heat flux necessary to 
sustain a dynamo may be smaller, given that convection can be driven 
both compositionally and thermally. Estimates? for the current Qcmp 
(12 +5 TW) far exceed this threshold, so for a nominal scenario of 
Qcmp declining or constant with time*” magnetic activity is expected 
throughout Earth history, and would probably only have been absent 
when internal dynamics differed substantially from those of the pres- 
ent, for example in periods lacking plate tectonics”. Similarly, evidence 
of non-zero palaeomagnetic field places a hard constraint on the cor- 
responding heat flux of Qcmp > 2.2 TW before inner-core nucleation. 

However, the inner core can be older for lower core thermal conduc- 
tivities°, and within the uncertainty due to the light-element content of 
the core, the inner core can be as old as the earliest recorded terrestrial 
magnetic field!®, that is, up to 4.2 billion years old. Thus, within our 
direct experimental constraints, there is no requirement that Earth's 
geodynamo ever existed in the absence of an inner core. Indeed, the 
planet's dynamo and its solid inner core may have co-existed since soon 
after the formation of Earth. Greater knowledge of the light-element 
content of the core and its effect on thermal conductivity is essential to 
understand the earliest period of Earth’s core evolution. 
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METHODS 


Briefly, a high-purity Fe foil (99.99%, GoodFellow) placed between two anvils of 
the DAC and separated from the anvils by layers of insulating material (NaCl or Ar) 
was preheated to a desired stable temperature using double-sided continuous-wave 
infrared laser heating, and then pulse-heated on one side with an additional infra- 
red laser to create a thermal disturbance!!. The evolution of this disturbance was 
characterized by nanosecond-resolved radiative temperature measurements using 
a streak camera coupled to a grating spectrograph that records the thermal incan- 
descent history from both sides of the foil. The phase shift and the reduction in 
amplitude of the temperature disturbance as it propagates across the foil are thus 
measured!!. At a given pressure, a series of data sets were collected using differ- 
ent continuous-wave and pulse laser powers. Temperatures studied ranged from 
~1,600K, the lowest detectable temperature, to 4,000 K at the maximum, whereas 
temperature disturbances were typically a few hundred kelvin in amplitude. 

The temperature evolution was fitted to time-dependent finite-element models 
of the laser-heated DAC!” to determine the thermal conductivity of Fe samples. 
For the finite-element modelling we employed experimentally determined geomet- 
rical parameters and thermochemical parameters determined from known equa- 
tions of state. The thermal conductivity of the sample, together with the thermal 
conductivity of the pressure medium and heating power, were adjusted until the 
best match of modelled and experimental temperature was achieved (Fig. 1a). The 
analysis was rigorously tested for sensitivity to input parameters (Extended Data 
Figs 2, 3 and 6). Total uncertainty and error bars (Fig. 2) were determined from 
the fitting uncertainty (Extended Data Figs 2 and 5), the scatter across different 
data sets (for example, Fig. 3), and uncertainty in input parameters (Extended Data 
Figs 3 and 6). We find the measurements to be sufficiently sensitive to the thermal 
conductivity of the sample foil to provide a major constraint on Fe conductivity 
at core conditions. 

The experiment duration (less than 10s per temperature history collection) 
was kept as short as possible to avoid sample damage and minimize the heating of 
optics and DAC that could cause instabilities during long laser-heating runs. Foil 
initial thickness (4.01 + 0.02 1m) and in situ thickness (Extended Data Table 1) 
were measured using white-light interferometry of the DAC cavity, and the index of 
refraction data for the media under pressure*!~*°; these measurements also deter- 
mined the sample-to-diamond-culet distances, which are important parameters in 
finite-element calculations. Foil thickness changes measured under compression 
were consistent with those derived from the known compressibility of Fe*‘. For the 
NaCl medium, insulation plates were formed and placed on the culets, and foils 
were placed between them; in the case of Ar, the foil was suspended on a recess 
in the gasket (Re). 

A sample of platinum, which has well defined thermal conductivity behaviour"! 
at high pressure and temperature, was available as a control in some experi- 
ments at low pressures where the DAC cavities were sufficiently large in diameter 
(P<55 GPa) to accommodate a second foil. The Pt foil had the same thickness as 
the Fe foil, and was positioned on the plane of the Fe foil in the cavity; for such 
foil pairs, sample and insulation thicknesses, cell geometry, pressure, medium, 
heating configuration, and detection system were identical, allowing a direct 
relative comparison between the thermal transport behaviour of the two materials. 
Heat wave propagation across the Pt was much faster than for Fe (for example, 
240 ns for the half-rise time, compared to 565 ns in Fe at 48 GPa; see Extended 
Data Fig. 4), corresponding to a lower thermal diffusivity for Fe. Fe samples were 
also observed to sustain larger axial temperature gradients than the Pt samples, 
manifested in a greater difference between peak amplitudes on either side of the 
foil. These observations affirm that at the studied conditions, the thermal conduc- 
tivity of Pt (160 +40 Wm !K 1; ref. 11) is substantially greater than that of Fe. 

The Lorenz number for ¢ Fe was determined by comparing shock wave electrical 

resistivity!? and the present thermal conductivity data at comparable pressure and 
temperature (Fig. 3). The result is 22 + 16% lower than the value for a free-electron 
metal®> (L=2.44 x 10° WK”), consistent with theoretical calculations’, which 
predict a Lorenz number reduced from the ideal by up to 17%. 
Experimental details. To generate thermal perturbations at high initial pressure 
and temperature, we combined double-sided continuous and single-sided pulsed 
laser heating of the DAC sample’. The initial temperature was reached by balancing 
laser power to either side of the sample until temperatures agreed to within 
~100K, and then pulsed heating was used to create a small perturbation in tem- 
perature which propagated across the sample. Our approach is similar to that 
used in traditional flash heating measurements of thermal diffusivity**, modified 
for a specimen under pressure in a DAC"!. The reduction in amplitude and 
phase shifting of the heat pulse with distance is an essentially one-dimensional 
phenomenon!?**, whereas two-dimensional effects have a secondary, but 
non-negligible, impact accounted for via finite-element modelling. 

Precise temperature determination during pulse laser heating was made with 
a streak camera detecting system coupled to a spectrometer, capable of detecting 


thermal emission in a time-resolved manner in a spectrogram. Spectrograms 
(3-10 1s) were synchronized to the heating pulses to follow the sample's temper- 
ature response on both sides. Thermal emission was fitted to a greybody Planck 
function assuming constant emissivity during the heat cycle!!, a reasonable approx- 
imation since thermal perturbations are small. The time resolution of the temper- 
ature measurements was 26 ns (3-|1s sweep) to 82 ns (10-,1s sweep). Spectrograms 
were integrated over 10? to 10‘ perturbation cycles, at a rate of 1 kHz and total 
integration times of 0.1-10s, the total integration time depending on temper- 
ature. Emission was calibrated to a tungsten ribbon lamp of known radiance. 
Temperatures were detected only above ~1,600 K owing to a lack of signal at lower 
temperatures. Experiments were limited at high temperatures owing to visible foil 
deformation in the melting regime of the sample and pressure medium!?. 

Pressure was measured by the ruby fluorescence technique at room temperature. 
Thermal pressures produced during laser heating are positive but small (of the 
order of a few gigapascals) in sample configurations similar to those used here'® 
and do not significantly affect our results. 

At pressures and temperatures in the stability field of Fe, face-centred cubic 
Fe and hexagonal close-packed ¢ Fe are commonly observed to coexist in experi- 
ments!®. Consequently, our data at these conditions may probe a mixed state of > 
Fe and ¢ Fe with a variable ~; Fe composition (Fig. 2). In contrast, at higher pres- 
sures, < Fe is typically the only observed solid phase at all temperatures!*’, so our 
data in this regime directly probe pure ¢ Fe. To test these expectations, we have 
also performed in situ X-ray diffraction measurements on laser-heated Fe samples 
prepared in a manner identical to that used in this study (with NaCl media), at the 
P02.2 beamline (ECB) of PETRA III in Hamburg. Using comparable timescales 
of heating, we confirm that a mixed phase should be present in the lower-pressure 
experiments reported in this study, but not at higher pressures. 

To prevent the uptake of impurities in our initially high-purity Fe foils, pressure- 
medium materials (NaCl, Ar) were chosen and carefully prepared so that 
reactions with the sample are avoided**’. During preparation, Fe foils and NaCl 
media and were kept dry, and contact with the atmosphere was minimized to 
prevent foil oxidation. Carbon from diamond anvils is known to react with Fe at 
high pressures and temperatures in laser-heated DAC experiments, but generally 
at much higher temperatures (and longer timescales) than probed in this work'®*”. 
In testing our sample preparation and heating technique in separate in situ X-ray 
diffraction experiments, we ruled out oxidation or reaction with the medium, and 
confirmed that carbide formation occurs at much higher temperatures and longer 
heating timescales than we have used here. Thus, our Fe samples should remain 
very pure at the pressures, temperatures, and timescales of this study. Analysis of 
the recovered sample from experiments at 58-74 GPa (using electron imaging, 
energy dispersive scattering for chemical analysis, and a focused ion beam to sec- 
tion the foil at heated regions) found no detectable local enrichment of impurities 
in the heated areas of the sample, indicating bulk impurity levels well below detec- 
tion limits (<0.6 wt% C, <0.6 wt% O, <100 parts per million Ar), consistent with 
expectations from X-ray diffraction. Finally, no systematic changes in measured 
conductivities were observed with heating time, indicating that samples did not 
undergo any progressive transformation (such as a reaction) that influenced the 
thermal conductivity. 

Model for pressure variation of thermal conductivity. The model used here to 
estimate pressure variation of thermal conductivity (equation (3)) is based on a 
formal differentiation of the electronic thermal resistivity (W.=1/k.) with respect 
to density combined with the definition of the Griineisen parameter 
(y= (OlnOp / Alnp)r, where Op is the Debye temperature, and p is density), which 


leads to”: 
Olnw. OlnC 
=-2 4 

| dlnp | = | dlnp | 


where C is a constant containing lattice and band structure information originat- 
ing from the Bloch-Griineisen expression. Bohlin" finds(@InC/Olnp)r to be equal 
to —1/3 in ordinary pure metals; the variation of electronic thermal conductivity 
with pressure can then be expressed in terms of the isothermal bulk modulus (K7) 
and the Griineisen parameter (7) as equation (3). 

The Griineisen parameter of Fe is fairly well known at high pressure and 
room temperature: the data of refs 42 and 43 agree well, particularly above 
100 GPa. At core conditions (high T), y(P, T) and K,(P, T) were evaluated using 


a thermal equation of state of Fe (ref. 27), with y= eae where 7 = 1.78, 
0 
q=0.69 and Vy = 6.73 cm? mol"!. The P, T description of 7 is expressed in a 


polynomial form: 


_ a+cP+eT+gP?+iT?+kPT 
1+bP+dT+fP? +hT? +jPT 


(5) 
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We described K7(P, T) by the following equation: 


KoP K3T 
K+ i) + nr) (6) 


K,(P, T) 


All the coefficients for 7 and Ky (equations (5) and (6)) are given in Extended 
Data Table 2. 

This model gives good agreement with ¢ Fe electrical resistivity data at lower 

pressures and ambient temperatures>"“, fits the present thermal conductivity 
results for ¢ Fe well (Fig. 2), and implies that thermal conductivity is only weakly 
pressure dependent above 100 GPa, consistent with prior expectations”. Thus, our 
measurements, taken at pressures close to those at the top of Earth’s core, should 
constrain overall core conductivity accurately. 
The Lorenz number for ¢ Fe. The temperatures and pressures of our thermal 
conductivity measurements overlap with those of shock wave electrical resistivity 
measurements"*, allowing a comparison between the resistivity and thermal con- 
ductivity measurements to obtain an empirical value for L. 

At 112 GPa, where the most extensive high temperature data set was availa- 
ble in the present results, electrical conductivity was estimated as follows using 
the data of ref. 15. The two lowest pressure points from that study at 101.1 GPa 
and 146.7 GPa are solid-state data and so are comparable to the present results; 
a higher pressure point corresponds to the liquid'>"*. First, a temperature for the 
middle of the three data points (146.7 GPa, 3,357 K) after isentropic release from 
the initial conditions (173.4 GPa, 3,552 K)—not reported—was estimated from 
the scaling of release behaviour reported by ref. 15; release temperatures were con- 
firmed by independent calculation using an Fe equation of state*“. The electrical 
conductivity at 112 GPa is then estimated as (1.13 + 0.11) x 10°Sm~ at 2,332 K, 
based on a linear interpolation between the solid-state data points, and assum- 
ing an uncertainty of ~10%, consistent with uncertainties reported for similar 
measurements!® and scatter in the data reported by ref. 15. At this temperature 
in our experiments, k=50+10Wm _'K! (Fig. 3). The corresponding value of 
Lis then (1.9 +0.4) x 10 °WQK~, 22 + 16% less than the standard value for a 
free-electron metal. This correction has a small influence on our analysis. For 
example, assuming the free-electron value of L, the shock wave results of ref. 15 
imply a value of 67 W m~!K~! at 112 GPa and 2,330K, only slightly above the 
measured value. 

The correction to the standard value of L determined here for ¢ Fe is typical for 
Fe at various conditions and phases”!444 (30%) and is similar to other transi- 
tion metals!)*. In Pt, L is measured! to deviate from the ideal value by -+30% at 
temperatures up to 2,000 K. For Mo, deviations of —10% to —30% are predicted at 
high temperature*®. The variation of L across transition metals at low temperature 
alone is large*®, with values such as in Cu (—9%) and W (+31%). 

We note that early shock data on Fe electrical resistivity at high pressures!” find- 
ing systematically higher electrical conductivities compared to later work", cannot 
be considered to agree with our measurements, as an unrealistically large reduction 
in the Lorenz number would be needed. It has been proposed that spurious values 
were obtained in the earlier studies'® at higher pressure (P > 50 GPa) owing to 
insulator-conductor transformation of epoxies used in target construction, an 
effect avoided in later measurements!». 

Model for temperature variation of thermal conductivity. Equation (2) was 
selected in consideration of the observed variation of electrical conductivity in ¢ Fe 
with temperature". Electrical conductivity is modelled as following a relationship: 


7=0)+AT" (7) 


where n= —1 is typically assumed for metals at high temperatures as in the Bloch- 
Griineisen model*”"*"“, A value closer to n= —1.3 has been suggested for Fe at 
high pressures from resistivity measurements, under both shock and static load- 
ing, which probed temperatures and pressures similar to those examined here". 
Similarly, fitting equation (7) to resistivity data under external heating of statically 
compressed samples®, for which temperatures are particularly accurate, yielded 
values of n= —1.50 £0.07, 09 = (1.04 £0.46) x 10°, A= (6.51 +2.2) x 10!°, for 
a in units of siemens per metre and T in units of kelvin (Extended Data Fig. 1a). 
Then, considering the Wiedemann-Franz relation (equation (1)), we can write: 


k=L(Toy + AT!*") (8) 
leading to the empirical form in equation (2). We chose here n= —1.5, though 
results are not significantly different selecting n = —1.3. 


Equation (2) is fitted to the present measurements at 112 GPa together with shock 
wave resistivity data!°, interpolated to 112 GPa as discussed above, and static resis- 
tivity data>™4 extrapolated to 112 GPa using a double-exponential fit of the form: 


A a+ G,exp(7P) + G,exp(T2P) (9) 
a 
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An initial fit gave a= (0.89 + 1.33) x 10 3>Wm !K-?, b=2,040+140Wm !K-1?, 
The linear component of the fit is nearly zero, thus a reasonable simplified version 
of this model for Fe is: 


kab! / JT (10) 
where b’ = 1,972 +83 Wm !K~?” (Fig. 3 and Extended Data Fig. 1b). 

The model captures a decrease in the thermal conductivity with temperature, 
which is seen in the present measurements and also implied by the prior resistivity 
data>4!5 (Fig. 3). In terms of electrical resistivity (Extended Data Fig. 1c), the 
scaling with temperature obtained by the model compares well with that observed 
by ref. 5 ine Fe at lower pressures, and shows a similar dependence to that seen 
in + Fe (or possibly in the y-< mixed phase) at high temperatures”". It is seen that 
e Fe up to 112 GPa has higher resistivity than + Fe (or its mixed phase) at lower 
pressure (Extended Data Fig. 1c), consistent with our experimental observation of 
higher thermal conductivity in 1 Fe compared to ¢ Fe in the low-pressure region 
(Fig. 2). 

We note that the minimum measured thermal conductivity is in close agreement 
with values expected at traditional resistivity saturation® (Extended Data Fig. 1b); 
however, as resistivity saturation in Fe at extremes has not been clearly confirmed 
by theoretical studies and since available saturation models? cannot satisfactorily 
describe the data, we conclude that at present there is no reason to adopt that resis- 
tivity saturation has occurred. Assuming it has, then € Fe at temperatures above 
~3,000 K is saturation-dominated, such that thermal conductivities at core condi- 
tions would be somewhat higher (60-80 Wm! K~') than assessed by the present 
modelling; however, this upper bound on conductivity is still low compared to 
many prior estimates, and would not substantially alter our main conclusions. 
Error assessment in the thermal conductivity determination. The laser-heated 
DAC in combination with numerical simulations has been shown to be a promising 
tool for studying heat transfer at high pressures and temperatures! !!76047-50, This 
approach requires a detailed understanding of heat transfer in the DAC, including 
quantitative relationships between the temperature distribution, pressure chamber 
geometries and sample physical properties. 

Finite-element model fits to temperature histories were generally performed 
using a manual adjustment of model parameters. This approach was evalu- 
ated against a Levenberg-Marquardt least-squares minimization of the finite- 
element model variables (Extended Data Fig. 5). This automatic optimization was 
able to improve fit quality but the improvement was not statistically significant. 
Furthermore, as a good initial guess was required, this additional step only added 
to the processing time, and was therefore not used for all data sets. 

In the present study, all input parameters in modelling were carefully examined 
for their effect on the determination of sample thermal conductivity (Extended 
Data Figs 2 and 3). Uncertainties in the input parameters (such as pressure chamber 
geometry) were in this way included in our overall uncertainty determination for 
k, The heat capacity Cp of the pressure medium has a negligible effect (Extended 
Data Fig. 3a). For Cp of Fe we derived a range of values of 500-700J kg-!K~! from 
equations of state for ¢ Fe (refs 34 and 51) and other estimates”. Within this range, 
the resulting sample k is unaffected (Extended Data Fig. 3b). Thermal conduc- 
tivity of the diamond anvils, temperature dependence of thermal conductivity of 
the pressure medium, and smaller or larger laser beam size (by about 13%) also 
have negligible effects on the sample k (Extended Data Fig. 3c-e). Sample and 
insulation layer thicknesses, on the other hand, contribute to the uncertainty in 
sample k: an approximately +20% change in thicknesses leads to +7 Wm !K™! 
changes in sample k (Extended Data Fig. 3f-i). We assume a constant value of k for 
the foil in our simulations, but no significant change in results is obtained using a 
temperature-dependent k (Extended Data Fig. 3)). 

To check potential couplings between the uncertainties in the input parameters, 
we have also propagated uncertainty in our input parameters in a more rigorous 
manner using a Monte Carlo approach (Extended Data Fig. 6). To do this, we 
considered only parameters which were identified as having a first-order impact on 
the measurements: the thicknesses of the medium on both sides of the sample, and 
the sample thickness. We performed 64 Monte Carlo samples within the Gaussian 
probability distributions of the thickness parameters, given standard deviations 
of 30% in each, for a representative experiment at 130 GPa (see Extended Data 
Fig. 6a). For each sampling, the data was fitted automatically (Extended Data 
Fig. 5) to determine the two thermal conductivities and the powers for the three 
lasers (Extended Data Fig. 6b). The distribution in the values for k of Fe has a 
standard deviation comparable to our single-point error (Extended Data Fig. 6d). 

While suitably sensitive to the thermal conductivity of the foil, our measure- 
ments are less sensitive to the thermal conductivity of the insulating medium, 
which is included as a variable in fitting (usually as a constant) but which had 
values more sensitive to the assumed sample geometry (thickness of the insu- 
lation layers), laser beam diameter and laser power. Thus, the conductivities of 
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insulating media are not reported, as they are not robustly determined by our 
approach. For Ar, the values of k obtained in the fits were generally in the range of 
50-100 Wm !K “1, consistent with previously reported values”. 
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Extended Data Figure 1 | High-temperature transport properties of circles and data derived from prior electrical resistivity measurements”!*!° 
Fe. a, Graph of the electrical conductivity° as a function of temperature are open symbols (see Fig. 3). The red band is the minimum thermal 
of ¢ Fe at 65 GPa and model fit (to equation (7)). b, Thermal conductivity conductivity assuming resistivity saturation’. c, Electrical resistivity at 
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linear term (to equation (10)) is a dashed blue line. Present data are solid 
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Extended Data Figure 2 | Comparison between measurements and different values of sample k, all other parameters being held constant. 
models for different values of thermal conductivity. Data for pulsed The data sets at 112 GPa (a) and 130 GPa (b) have been measured using 
and opposite sides of the foil are dots; the larger temperature excursion is 3-j1s and 10-j1s sweep windows, respectively. 


on the pulsed side. Green, magenta and cyan curves are simulations with 
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Extended Data Figure 3 | Tests of the sensitivity of finite-element 
model results to input parameters for an example run at 112 GPa. 

This experiment shows a large amplitude of temperature modulation 
that accentuates the effects of parameter changes. A best-fit value of 
k=30Wm_!K"1, obtained using parameters listed in Extended Data 
Table 1, is obtained from these model fits unless stated otherwise. a, Effect 
of heat capacity of the Ar pressure medium. Uncertainty in medium 

Cp has no effect on k of the sample. b, Effect of heat capacity of the 
sample. Temperature profiles for two values of Cp of Fe (500Jkg-'K~! 
and 700Jkg~!K~') indicate that results are only weakly affected by the 
uncertainty in Cp for Fe. c, Change in the thermal conductivity value of 
diamond anvils from 1,500 Wm! K~! to 2,000 Wm~!K~! requires an 
increase in thermal conductivity of the sample from 30 Wm~'K~! to 
31Wm 'K"!.d, Effect of using a T-dependent k of the medium. After 
ref. 49, a dependence k(T) = k309(300/T)” is used, where k3o9 is the 300-K 
conductivity, T is in kelvin, and m is an exponent (of order 1); k30o 

(300 W m7! K~?) is extrapolated from prior results at lower pressure’? 


and m (0.7) is fitted to the present data. No change in sample k is indicated 
using this or any other k(T) model we tested for the media. e, Laser 

beam radius change of +13% does not affect the temperature noticeably. 
f, A sample thinner by 23% (reduced from 2.6 1m to 2.0,1m) would 
require a lower sample k of 22 Wm~!K~!. g, A sample thicker by 15% 
(increased from 2.6 1m to 3.0j1m) would require an increased sample k of 
37Wm 'K_!.h, The insulation layer was decreased on both sides by 
38%, from 1.6}1m to 1.0j1m. Sample k had to increase to 39 Wm !K7}. 

i, The insulation layer was increased on both sides by 25%, from 1.6 1m to 
2.0,1m. Sample k had to decrease to 27 W m~! K“!. j, Effect of including 

T dependence of sample k in models. The temperature profile calculated 
using our global fit at 112 GPa (equation (2)) is shown as a magenta 

line; this dependence scaled within its uncertainty (reduced by a factor 

of 0.83) to improve the fit is shown as a cyan line. The resulting sample 

k varies between 24W m~'K~! and 35 Wm_!K! in the T range of the 
experiment; the estimate assuming constant sample k is the average of 
these values. 
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Extended Data Figure 4 | Comparison of data on Fe and Pt at 48 GPa diffusivity « =(k/pCp) of Fe is much less than Pt, since!*® Ko 1/7. 
for an identical sample configuration. The data clearly show slower Similarly, the smaller amplitude of the perturbation upon opposite surface 
propagation of heat across the Fe foil compared to Pt (ref. 11), as given arrival indicates a smaller k in Fe than in Pt. 


by the half-rise time 7. This observation directly shows that thermal 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


i 
—_ 


Temperature (10° K) 


31 


Extended Data Figure 5 | Comparison between manual and automatic 
optimization results for an experiment at 130 GPa. The manual 
approach, used as our primary fitting method, was based on an adjustment 
of model parameters by hand within a precision of 5 W_m_~'K7|, giving 
k=45Wm7!K7! and ka, =60Wm~!K~! as the best fit. The automatic 
result is the best fit based on a Levenberg-Marquardt least-squares 
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minimization of model parameters, yielding k= 38.6 Wm !K~! and 

kay =50.4W m_!K7!. The automatic optimization obtained a better 
least-squares fit (v7 improved by 23%); however, the difference in k is not 
statistically significant. 
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Extended Data Figure 6 | Monte Carlo analysis of error coupling in 
thickness uncertainties and effect on thermal conductivities, for the 
130-GPa data set shown in Extended Data Fig. 5. a, Histogram showing 
randomly sampled thicknesses (upper and lower medium, and foil) 

in Gaussian probability distributions with standard deviation 30%. 

b, Thermal conductivities for Ar and Fe for 64 samples. The greyscale 


refers to the value of the coupler thickness, showing the correlation 
between high values for k and thicker coupler. The results of fits shown 
in Extended Data Fig. 5 are blue and red triangles, while the mean and 
one standard deviation found from the spread of sampled thermal 
conductivities is the orange triangle. c and d are histograms showing the 
distribution of thermal conductivities in b. 
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Extended Data Table 1 | Input parameters used for the finite-element modeling 


P= medium sample pulsed side opposite medium Cc, medium iron C, iron 
thickness thickness side thickness density density 
(GPa) (um) (um) (um) (kg m*) (Jkg*K") (kgm) (Jkg"K") 

35 NaCl 3.0 8.0 7.0 3630 aes atte 2 STO0UK  eed2 450 
1103, T >1000K 

48 NaCl 2.9 7A 6.7 3911 qABe ed PS VONOR Gage 450 
4103, T >1000K 

5B Ar 2.9 15 6.5 4539 570 10174. 700 

74 Ar 28 1.0 6.4 4800 570 10476 700 

88 Ar 27 1.7 17 5057 570 10800 700 

112 Ar 26 1.6 16 5326 570 11225 700 

130. Ar 25 15 15 5550 570 11590 700 
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Extended Data Table 2 | Coefficients for the Griineisen parameter 
and the isothermal bulk modulus used to estimate pressure 


variation of thermal conductivity 


Coefficient for Coefficient 
Griineisen parameter __ for K,, 

a 1.76x 10° K, 97.50 
b 2.04 x10 Ky 2018 
e 2.90x10? Kk, -0.26 
d -1.32*104 

& 1,87 810" 

f  3.90x10° 

g 3.42%x10° 

h 2.55 10° 

i 3.05x10° 

j 9.10% 10° 

k -4.37*107 
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The industrial melanism mutation in British 
peppered moths is a transposable element 


Arjen E. van’t Hof", Pascal Campagne!*, Daniel J. Rigden!, Carl J. Yung!, Jessica Lingley!, Michael A. Quail?, Neil Hall’, 


Alistair C. Darby! & Ilik J. Saccheri! 


Discovering the mutational events that fuel adaptation to 
environmental change remains an important challenge for 
evolutionary biology. The classroom example of a visible 
evolutionary response is industrial melanism in the peppered 
moth (Biston betularia): the replacement, during the Industrial 
Revolution, of the common pale typica form by a previously 
unknown black (carbonaria) form, driven by the interaction 
between bird predation and coal pollution’. The carbonaria 
locus has been coarsely localized to a 200-kilobase region, but the 
specific identity and nature of the sequence difference controlling 
the carbonaria-typica polymorphism, and the gene it influences, 
are unknown’. Here we show that the mutation event giving rise 
to industrial melanism in Britain was the insertion of a large, 
tandemly repeated, transposable element into the first intron of 
the gene cortex. Statistical inference based on the distribution of 
recombined carbonaria haplotypes indicates that this transposition 
event occurred around 1819, consistent with the historical record. 
We have begun to dissect the mode of action of the carbonaria 
transposable element by showing that it increases the abundance of 
a cortex transcript, the protein product of which plays an important 
role in cell-cycle regulation, during early wing disc development. 
Our findings fill a substantial knowledge gap in the iconic example 
of microevolutionary change, adding a further layer of insight into 
the mechanism of adaptation in response to natural selection. The 
discovery that the mutation itself is a transposable element will 
stimulate further debate about the importance of ‘jumping genes’ 
as a source of major phenotypic novelty°. 

Ecological genetics, the study of polymorphism and fitness in 
natural populations, has been revitalised through the application of 
next-generation sequencing technology to open up what were previ- 
ously treated as genetic black boxes*”. Growing appreciation of the 
loci and developmental networks that generate adaptive phenotypic 
variation® promises to answer fundamental questions about the genetic 
architecture of adaptation, such as the prevalence of genomic hotspots 
for adaptation’, the relative contributions of major- and minor-effect 
mutations®, and the structural nature and mode of action of benefi- 
cial mutations’. Characterizing the identity and origin of functional 
sequence polymorphisms provides the explicit link between the 
mutation process and natural selection. In this context, while indus- 
trial melanism in the peppered moth has retained its appeal as a 
graphic example of the spread of a novel mutant rendered favour- 
able by a major change in the environment, the crucial piece of the 
puzzle that has been missing is the molecular identity of the causal 
mutation(s)!° 

A combined linkage and association mapping approach previously 
localized the carbonaria locus to a <400-kb region orthologous to 
Bombyx mori chromosome 17 (loci b-d)*. Thirteen genes and two 
microRNAs occur within this interval, none of which was known to be 


involved in wing pattern development or melanization. By extending 
the association mapping approach to a larger population sample and 
more closely spaced genetic markers (see Methods), we narrowed the 
carbonaria candidate region to about 100 kb (Fig. 1a). The candidate 
region resides entirely within the span of one gene — the orthologue of 
Drosophila cortex (cort), the only known function of which is as a 
cell-cycle regulator during meiosis!’. In B. betularia, cortex consists 
of eight non-first exons, multiple alternative first exons (of which only 
two, 1A and 1B, are strongly expressed in developing wing discs), and 
avery large first intron (Fig. 1b). 

The rapid spread of carbonaria gave rise to strong linkage disequi- 
librium?, such that many sequence variants are associated with the 
carbonaria phenotype. This poses a challenge for isolating the specific 
causal variant(s). We reasoned that if the carbonaria mutation arose 
on an ancestral typica haplotype’, the hitchhiking variants should in 
principle also be present at some frequency within the typica popula- 
tion, leaving the causal variants as the only ones unique to carbonaria. 
High-quality contiguous reference sequences were assembled from 
tiled bacterial artificial chromosome (BAC) and fosmid clones, result- 
ing in one carbonaria and three different typica core haplotypes (see 
Methods and Extended Data Fig. 1). Alignment of these sequences 
(Supplementary Data 1) revealed 87 melanization candidate polymor- 
phisms (Fig. 1b and Supplementary Table 1), concentrated within the 
large first intron of cortex (69-91 kb, depending on haplotype). Eighty- 
five candidates were eliminated using an increasing number of typica 
individuals to exclude rare variants. A single nucleotide polymorphism 
(carbonaria_candidate_25) was eventually excluded on the basis of one 
individual out of 283 typica, leaving a very large insert (carbonaria_ 
candidate_45) as the only remaining candidate. 

The insert was found to be present in 105 out of 110 fully black 
moths (wild caught in the UK since 2002) and absent in all (283) typica 
tested (see Methods and Extended Data Fig. 2). Consistent with local 
carbonaria morph frequencies of 10-30% (ref. 12), 2 out of 105 indi- 
viduals were homozygous for the carbonaria insert. Five individuals 
that were morphologically indistinguishable from carbonaria did not 
possess the carbonaria insert; they do not present any strong haplotype 
association based on this set of candidate loci but do all differ from 
the core carbonaria haplotype at many positions. Our interpretation is 
that these individuals are hetero- or homozygous for the most extreme 
of the insularia alleles (intermediate phenotypes), which are known 
to occasionally produce carbonaria-like phenotypes'*"* and segre- 
gate as alleles of the carbonaria locus in classical genetics crosses"*. 
Conversely, none of the genotyped insularia morphs (31 individuals, 
covering the full spectrum of variation from i, to i3 (ref. 14)) contains 
the carbonaria insert (Extended Data Fig. 2). We conclude that the 
large insert is the carbonaria mutation. 

The carbonaria insert is 21,925 nucleotides long and is composed 
of a roughly 9-kb essentially non-repetitive sequence (except for 


'Unstitute of Integrative Biology, University of Liverpool, Biosciences Building, Crown Street, Liverpool L69 7ZB, UK. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, 


Cambridgeshire CB10 1SA, UK. 
*These authors contributed equally to this work. 
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Figure 1 | The carbonaria candidate region, and the position and 
structure of the carbonaria mutation. a, Approximately 400-kb candidate 
region (bounded by marker loci b and d (ref. 2)) indicating gene content 
and genotyping positions (vertical lines in the continuous grey bar). 
Intron-exon structure and orientation are illustrated separately for each 
gene (annotated in GenBank accession KT 182637). b, Refined candidate 
region including candidate polymorphisms (lines on the grey bar). The 
intron-exon structure of cortex is shown for carbonaria (black moth) 

and typica (speckled moth), highlighting the presence of a large (22 kb) 
indel (orange) within the first intron. Exons 1A and 1B are alternative 
transcription starts followed by the shared exons 2-9. c, The only exclusive 


approximately 370 nucleotides at the repeat unit junctions) that is 
tandemly repeated approximately two and one-third times, with only 
minor differences among the repeats (Fig. 1c). The insert bears the 
hallmark of a class II (DNA cut-and-paste) transposable element: short 
inverted repeats (6 bp) and duplication of the (4-bp) target site present 
in typica haplotypes (Extended Data Fig. 3). We estimate that there 
are approximately 255 and 60 genomic copies, respectively, of the 9-kb 
carbonaria transposable element (carb-TE) repeat unit and repeat unit 
junctions, implying that there are relatively few genomic copies of 
the complete carb-TE. No nucleotide or translated BLAST hits were 
found in any relevant database, with the exception of B. betularia 
RNA-sequencing (RNA-seq) reads (NCBI: SRX371328), indicating 
that the carb-TE repeat unit is Biston-specific. 

To examine patterns of recombination, which provide insight into 
the evolutionary dynamics of a chromosomal region, we genotyped 
the same 105 carbonaria and a sub-set of 37 typica, plus 35 insularia, 
at 119 polymorphic loci within 28 PCR fragments distributed across 
200 kb either side of the carb-TE (Fig. 1a). Diploid genotypes were 
phased, and the resulting haplotypes divided into those with and with- 
out the carb-TE. The sequence identity of the ancestral carbonaria 
haplotype, whose core was known from the BAC and fosmid work, 
was extended by assigning allelic state at each marker locus to ancestral 
carbonaria or typica/insularia. Fifty per cent of carb-TE haplotypes had 
retained the ancestral carbonaria haplotype across the full 400-kb win- 
dow, and the remainder showed varying degrees of recombination with 
typica haplotypes on one or both sides of the causal mutation (Fig. 2a). 
The recent selective sweep’” is reflected by declining linkage disequi- 
librium between the carbonaria locus and marker loci with increasing 
genetic distance (Fig. 2b). The tenure of the carb-TE has been tran- 
sient, having declined from ~99% to less than 5% in its industrial 
heartland since 1970 (ref. 16). It has nevertheless left a substantial trace 
of its former abundance in the form of ancestral carbonaria haplotype 
blocks introgressed into typica and insularia haplotypes, consistent 
with the simulation-based expectation (Fig. 2c). 
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carbonaria-typica polymorphism within the candidate region. The 
structure of the insert, shown in the carbonaria sequence, corresponds to 
a class II DNA transposon, with direct repeats resulting from target site 
duplication (black nucleotides) next to inverted repeats (red nucleotides). 
Typica haplotypes (lower sequence) lack the 4-base target site duplication, 
the inverted repeats and the core insert sequence. The transposon consists 
of ~9 kb tandemly repeated two and one-third times (repeat unit 
(RU)1-RU3), with three short tandem subrepeat units (green dots, 
SRU1-SRU9) within each repeat unit. Moth images were created from 
photographs taken by A.E.v.H. 


The first reported sighting of the carbonaria form is generally 
regarded as having occurred in 1848 in Manchester’, although the 
wording of the record implies that it was rare but not completely 
unknown at this time. Establishing how long before this date the 
carbonaria mutation occurred is complicated because it could 
have existed undetected at a low frequency for hundreds of years 
(Supplementary Methods). Our approach to this problem was to 
infer the age of the mutation event independently by considering the 
erosion of the ancestral carbonaria haplotype due to genetic recombi- 
nation and mutation. One million simulated time trajectories of the 
carbonaria phenotype were randomly drawn according to their fit to 
historical frequency data (Extended Data Fig. 4). Based on these tra- 
jectories, recombination patterns were simulated using an empirical 
estimate of recombination rate and compared to the observed recom- 
bination pattern of the carbonaria haplotypes. The probability density 
for the date of the carb-TE mutation event (Fig. 2d) is highly skewed 
(median, 1763; interquartile range, 1681-1806) with a maximum like- 
lihood at 1819, a date highly consistent with a detectable frequency 
being achieved in the mid-1840s. 

The position of the carb-TE suggests that its effect on melanization 
is achieved by altering the expression of cortex through one of several 
potential mechanisms!’ (incorporation of any part of the carb-TE 
into cortex transcripts has been excluded). Biston cortex is charac- 
terized by numerous splice isoforms and alternative first exons; we 
focus on the population of transcripts initiated by exons 1A and 1B, 
as the other first exons are absent or only weakly expressed in Biston 
wing discs, and did not exhibit morph-specific differences (Extended 
Data Fig. 5). The global pattern of splice isoforms showed neither con- 
sistent presence or absence nor crude relative abundance differences 
among morphs for any developmental stage (Extended Data Figs 6, 7). 
Cumulative expression across all isoforms (Fig. 3a) increases by an 
order of magnitude between the sixth larval instar (La6) and day 4 
prepupa (Cr4), coinciding with a phase of rapid wing disc morphogen- 
esis (Fig. 3b), and falls back to a low level by day 6 prepupa (Cr6) with 
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Figure 2 | Recombination pattern and ageing of 
the carb-TE mutation. a, Nearest recombination 
between carbonaria (carb-TE present (orange)) 
and non-carbonaria (typica and insularia (light 
grey)) haplotypes (n= 107), 200 kb either side of 
the carb-TE (at position 0). Dark grey 

areas indicate boundaries within which 
recombination occurred. b, Multilocus linkage 
disequilibrium (rg) across the same sequence 
window among carbonaria and non-carbonaria 
haplotypes. Grey area indicates the widest 99% 
confidence region, across loci, for the null 
hypothesis (rg 0). Red lines represent the 
simulation-based upper bound under the extreme 
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assumption that all alleles defining the carbonaria 
haplotype were initially exclusive to it (mean and 
90% interval). c, Introgression of the ancestral 
carbonaria haplotype (black) into non-carbonaria 
haplotypes (grey; carb-TE absent (n = 144)). Red 
lines represent the simulation-based expectations 
(mean and 90% interval). d, Probability density 
for the age of the carb-TE mutation inferred from 
the recombination pattern in the carbonaria 
haplotypes (maximum density at 1819 shown 

by dotted line; first record of carbonaria in 1848 
shown by dashed line). 
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no clear difference among morphs (‘/t versus c/t, P > 0.5). To exclude 
interference by potentially non-functional isoforms, we targeted full 
transcripts only, starting with either 1A or 1B. The abundance of the 
1B full transcript shows a consistent trend across several families with 
different genetic backgrounds (c/c > c/t > t/t) that is most pronounced 
at Cr4 (Fig. 3c and Extended Data Fig. 8a). The abundance of the 
1A- initiated full transcript, which is in general an order of magnitude 
less than that of the 1B transcript, does not show a significant differ- 
ence between genotypes (Fig. 3d and Extended Data Fig. 8b). 

The role of cortex in wing pattern melanization is not obvious. 
In Drosophila, cortex has been primarily associated with meiosis in 
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ovaries! (several cortex transcripts are expressed in B.betularia ovaries 
and testes; Extended Data Fig. 5). The molecular function of cortex is 
suggested by phylogenetic analysis, which indicates that Biston cortex 
occurs in a lepidopteran sub-group within an insect-specific clade of 
a protein family containing the cell-cycle regulators cdc20 and cdh1, 
encoded by fzy and rap (also known as fzr) in Drosophila (Extended 
Data Fig. 9b). These proteins help to regulate fundamental cell division 
processes such as cytokinesis by presenting substrates to, and activating, 
the anaphase-promoting complex or cyclosome (APC/C), which ubiq- 
uitinates cell-cycle proteins, thereby earmarking them for degradation. 
Substrate recognition occurs by binding to degrons (short linear motifs 


Figure 3 | Relative expression of cortex in 
developing wings of B. betularia. a, Average 
expression (across typica and carbonaria morphs) 
of all cortex splice variants (exons 7-9) relative to 
the control gene a-Spec in wing discs at different 
developmental stages (La6, sixth instar larvae; 
Cr2, day 2 crawler; Pu2, day 2 pupae; PDP, post- 
diapause pupae). Bars are s.e.m. b, Scaled images 
(created from photographs taken by I.J.S.) of 

B. betularia forewings at different stages. 

c, d, Tukey plots for relative expression of cortex 
1B (c) and 1A (d) full transcripts in developing 
wings of the three carbonaria-locus genotypes 
(c/c, c/t and t/t) produced within the progeny of a 
c/t x c/t cross (no data for c/c at Cr2). Genotypes 
differ significantly for the 1B full transcript 


. (P< 0.001, generalized linear model (GLM)), 
whereas genotypes do not differ for the 1A full 
T+ transcript (P > 0.2, GLM). (Note the differing 


y-axes scales.) Equivalent graphs for the progeny 
of c/t x t/t crosses (which lack the c/c genotype) 
are presented in Extended Data Fig. 8. 
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such as the D box and KEN box). Sequence conservation across lepi- 
dopterans and non-lepidopterans reveals a single binding site in cortex 
(Extended Data Fig. 9c) that probably binds the D box-like'® degron 
LXEXXXN. This degron binding capability is predicted for both of the 
full isoforms (1A (441 amino acids) and 1B (407 amino acids), although 
1B apparently lacks the N-terminal C box that is usually required for 
APC/C binding) but not for the alternative isoforms (Extended Data 
Table 1). These data demonstrate orthology and are consistent with 
shared function of cortex between D. melanogaster and B. betularia, 
although the molecular connection between cell-cycle protein degra- 
dation at the APC/C and melanization remains to be determined. 

Our results suggest that the carb-TE influences adult melanization 
pattern by increasing the abundance of cortex, perhaps by altering the 
course of scale-cell heterochrony, with dominance arising through a 
threshold effect (the 1B full transcript is more abundant in c/c than 
c/t). How the carb-TE promotes cortex expression is unknown but the 
general mechanism is predicted to allow the production of insularia 
morphs that are putatively controlled by different mutations within 
cortex. In combination with parallel findings in Heliconius butterflies”°, 
our results support the idea that cortex is a conserved developmental 
node for generating colour pattern variation in evolutionarily diverse 
Lepidoptera. However, cortex may not be the only gene in this region 
involved in patterning, as suggested by recent work on the B. mori 
mutant Black moth, which has a similar phenotype to B. betularia 
carbonaria”', although none of the genes implicated is differentially 
expressed among carbonaria and typica wing discs. 

The carb-TE is a spectacular example of an adaptively advantageous 
transposon?*~*4; its discovery fills a fundamental gap in the peppered 
moth story and furthers our appreciation of the mechanism under- 
pinning rapid adaptation. A consensus on the general importance of 
transposable elements for adaptive evolution has yet to emerge*”’. 
Over longer time frames, phenotypic effects of transposable elements 
may be obscured by imprecise excision that leaves a minimal trace 
of the transposable element while retaining the mutant (adaptive) 
phenotype”®. By contrast, we have shown that the carb-TE is young, 
approximately 200 years (generations) old, during which time it has 
gone from a single mutation to near fixation (regionally) to near 
extinction—driven by a pulse of environmental change. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No formal statistical methods were used to predetermine sample sizes. The exper- 
iments were not randomized, and investigators were not blinded to allocation 
during experiments and outcome assessment. 

Wild samples. Moths used for fine mapping and ageing analysis came from a 
northwest England-north Wales transect sampled in 2002 (ref. 12), with 12 car- 
bonaria and 6 insularia specimens additionally collected in 2005-2009. 
Reference sequences. An extended BAC tiling path was constructed using mapped 
B. betularia genes*’, B. mori nscaf2829 (SilkDB) orthologues and BAC-end 
sequences as probes. Combinatorial PCR using BAC-end sequences and internal 
gene anchors were used to determine the relative positions of the BACs. Fosmids 
were used to bridge gaps. A minimal tiling path was sequenced as a 3-kb mate- 
pair library with Roche 454 GS FLX Titanium. Reads were assembled into contigs 
using Newbler and manually scaffolded using tiled BAC-end sequences and exon 
order of genes spanning multiple contigs as anchors. The scaffold covers a 3.6-Mb 
region spanning the mapped genes Mhc to leucine-rich transmembrane protein 
(Irtp) (GenBank accession numbers HM449891 and HM449887, respectively) 
with the carbonaria polymorphism located towards the centre. A recombination 
rate estimate within this region of 2.9 cM per Mb was obtained from a total of 350 
offspring in 8 crosses screened for recombination between the ends of the 3.6-Mb 
interval. Three typica and one carbonaria haplotype sequences were reconstructed 
using BACs and fosmids for the region spanning locus b-d (Fig. 1a). Clones were 
assigned to haplotypes on the basis of co-segregation of genotypes and phenotypes 
between parents and sibs of the heterozygous (carbonaria-typica) individuals used 
to generate the BAC (family 67) and fosmid (family 11) libraries. Small assem- 
bly gaps caused by repetitive sections were bridged using long capillary Sanger 
sequences; fosmid clone 25H14, containing the large repetitive transposable ele- 
ment, was sequenced using Pacific Biosystems RS II to 300 x coverage using P4-C2 
chemistry and assembled using HGAP v2 (Pacific Biosystems). Homopolymer 
length variation often caused by 454 errors rather than true polymorphisms was 
verified by Sanger sequencing. 

A draft genome assembly was generated from an individual homozygous for the 
carbonaria region. Full-sib carbonaria-typica heterozygotes were crossed (family 
135) to produce homozygous carbonaria offspring, as well as heterozygotes and 
homozygous typica. The carbonaria homozygotes were identified using alleles 
closely linked to the carbonaria locus, with more distant loci on either side used 
to ensure that the haplotype had not been disrupted by recombination. DNA was 
prepared by phenol-chloroform extraction from a final instar male larva with the 
gut removed. The genome was sequenced to ~3.5 x coverage on a 454 FLX-+ plat- 
form and a draft assembly constructed using Newbler. The genome assembly was 
used for polymorphism discovery, and for tiling path construction using homology 
to B. mori. Single read coverage was used to detect repetitive regions, aiding in 
single-target primer design, and to confirm the repetitive nature of the carb-TE. 

The gene content of the b-d interval was examined by comparing its sequence 
with GenBank proteins, expressed sequence tags (ESTs), transcriptomes, and 
annotated genes in the orthologous region in other Lepidoptera. Tblastx against 
these orthologous regions and Augustus”® gene prediction were used to detect 
potentially overlooked genes. All genes were manually annotated and (except for 
vepl) confirmed using cDNA. The annotation of 11 genes (not including cortex) 
was also subsequently confirmed against a B. betularia transcriptome (GenBank 
SRX371328) assembled with Trinity”. MicroRNAs were found using miRBase with 
blastn including hairpin precursors. BLAST (blastn, blastx) searches for carb-TE- 
like sequences were performed on NCBI databases (GenBank nucleotide, protein, 
EST, transcriptome), independently curated lepidopteran genome assemblies (for 
example, SilkDB), and RepBase (19.09). 

Fine mapping. The interval containing the carbonaria polymorphism was nar- 
rowed down to a section bordered on both sides by evidence of carbonaria haplo- 
type breakdown caused by recombination. Polymorphisms at regular intervals in 
the b-d region (Fig. 1a; Supplementary Table 2) were genotyped in wild-caught 
carbonaria, typica and insularia (105, 33 and 30 individuals, respectively). We 
conservatively used only homozygous genotypes to set these boundaries because 
the dominance of carbonaria obscures the assignment of alleles in heterozygous 
genotypes to a certain morph haplotype. The four contiguous haplotype sequences 
(one carbonaria and three typica) constructed from BACs and fosmids were aligned 
between these narrowed-down boundaries and examined for polymorphisms that 
were distinct in the carbonaria haplotype relative to all three typica haplotypes, 
resulting in 87 carbonaria candidate polymorphisms (Extended Data Fig. 1 and 
Supplementary Table 1). With the exception of carbonaria_candidate_45, wild- 
caught typica were genotyped at all loci by means of PCR- restriction-fragment 
length polymorphism (RFLP), PCR-indel or sequencing. Depending on the fre- 
quency of the candidate alleles in the typica sample, 16 to 283 typica (32 to 566 
typica haplotypes) were used for exclusion. Carbonaria_candidate_25 was present 


in only one out of 566 typica haplotypes. The typica phenotype of this individual 
(12-2002-01) was confirmed, as was the presence of the carbonaria_candidate_25 
allele from independently extracted DNA. A very large indel, later identified as 
the true carbonaria polymorphism (carbonaria_candidate_45), that could not be 
bridged by PCR required an alternative present/absent screening approach which 
also provided a positive control for absence haplotypes (to distinguish insert 
absence from PCR failure). A three-primer PCR was designed with two prim- 
ers flanking the indel and a third within the insert, relatively close to the indel 
boundary (Extended Data Fig. 2). The assay was validated using a family known 
to include all three genotypes (family 135, Extended Data Fig. 2). 

Inferring haplotypes and the age of the carbonaria mutation. A set of 177 indi- 
viduals, including 105 carbonaria individuals, was genotyped at 119 polymorphic 
loci within 28 PCR products, stretching across ~400 kb (Supplementary Table 2). 
Carbonaria haplotypes were inferred using SHAPEIT™ and the position (interval) 
of recombination breakpoints inferred based on two or more consecutive phase- 
switched polymorphisms. High repeatability of the phasing outcomes was verified 
by resampling, and switch errors were minimized by including known haplotypes 
and classifying only two types (melanic and non-melanic). Indices of multilocus 
linkage disequilibrium (rg) were calculated from polymorphisms within each PCR 
fragment and the carbonaria locus across the 400-kb interval*!. Their significance 
was assessed using 999 Monte-Carlo permutations. The pattern of introgression of 
the carbonaria haplotype into background haplotypes (that is, typica and insularia 
morph alleles) was assessed using ChromoPainter v2 (ref. 32) to search for contig- 
uous blocks that match the carbonaria haplotype, thus generating the ‘expectation 
painting’ of background haplotypes. 

The age of the carbonaria mutation was inferred with a simulation-based 
approach. The analysis was performed in three steps. First, 1,000,000 time- 
forward trajectories of the carbonaria phenotype were sampled, using a 
Metropolis—Hastings algorithm, depending on their likelihood given historical 
phenotypic frequencies (Supplementary Table 3), and conditional to their starting 
date (x) and population size (N). Second, recombination patterns were simulated 
using the sampled trajectories, in populations of size N, and a fixed recombination 
rate of 2.9 cM per Mb (males only). This process yielded sample distributions of 
the closest recombination breakpoint relative to the carbonaria locus. Finally, the 
likelihood of the simulated distributions given the empirical recombination pattern 
was computed and averaged across simulations to estimate the probability density 
of the mutation age (xo). For full details, see Supplementary Methods. 

Code availability. Code available on request. 

Expression and alternative transcripts of cortex. Offspring from either heterozy- 
gous carbonaria/typica (c/t) x homozygous t/t crosses segregating 1:1 or c/t x c/t 
crosses segregating 1 c/c: 2 c/t: 1 t/t were used for end-point reverse transcription 
PCR (RT-PCR) and real-time quantitative PCR (qPCR) experiments. Caterpillars 
were reared on grey willow (Salix cinerea). Wing discs (forewings and hindwings) 
were dissected from final (sixth) instar larvae, crawlers or prepupae (days 2-6 from 
the start of crawling stage), pre-diapause pupae (days 2-8 from pupation, at which 
point they have entered diapause) and post-diapause pupae (wing discs staged 
into six categories), and stored in RNAlater (Ambion). RNA was extracted with 
TRIzol and cDNA synthesized with SuperScript III (Invitrogen)-oligo(dT). The 
genotype-phenotype (adult morph) of each wing disc specimen was determined 
with the carb-TE three primer PCR (and verified by sequencing a linked single 
nucleotide polymorphism (SNP), carbonaria_candidate_25). Relative abundance 
and qPCR data were analysed using generalized linear (mixed) models (GLM). 
See Extended Data Fig. 8c for sample sizes. 

Quantitative PCR experiments were designed to measure the relative abundance 
of cortex transcripts, either of all transcripts combined (using primers in exons 
7 and 9) or full transcripts only (primers in exons 1A-3 and 1B-3, as exon 3 is 
effectively exclusive to the full transcripts (Extended Data Fig. 7)). DNase treatment 
was not performed, but for exons 7-9 qPCR co-amplification of genomic DNA 
was prevented by positioning the reverse primer on the exon 8-9 boundary (this 
was not a concern for exons 1-3 qPCR because the large first intron precluded 
genomic DNA amplification). We chose 40S ribosomal protein S3a (RpS3A)*? and 
a-spec* as two single-copy autosomal housekeeping genes. Primer sequences are 
listed in Supplementary Table 4. Annealing temperatures were optimised to 66 °C 
and amplicons were confirmed to produce single bands on agarose gels. CDNA was 
diluted 1:1 with water to allow template volumes within the accuracy range of the 
pipette used. Quantitative PCRs for target and control were run in three replicates 
using Kapa SYBR Fast qPCR Universal under recommended conditions on a Roche 
LightCycler 480 with 45 cycles and a melting curve. As both control genes gave 
similar results, only a-spec was used for the entire sample. 

Alternative transcription starts of cortex were searched for using 5’ rapid 
amplification of cDNA ends (RACE) on RNA extracted from 15 wing disc sam- 
ples covering a wide range of stages and c/c, c/t and t/t genotypes, and also from 
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whole pupae and testes. Cortex-specific cDNA was synthesized with SuperScript 
III and a gene-specific negative strand primer; 5’ cytosine extension was added 
using terminal transferase (NEB) and deoxycytidine triphosphates (dCTPs). The 
single-stranded cDNA was made double-stranded and a target sequence for ampli- 
fication incorporated in a single extension cycle (LongAmp Hot Start, NEB) with 
an oligonucleotide containing a 5’ primer recognition site and a 3’ poly-G tail. PCR 
was performed using a forward primer matching the synthetic 5’ end and a nested 
cortex-specific reverse primer. The amplicons were sequenced using a second 
nested primer. The alternative first exons were confirmed by Sanger sequencing 
with forward primers inside the newfound exons to generate clean sequence with- 
out the background noise commonly observed with 5’ RACE. 

The complete pattern of cortex splice variation was examined with end-point 
RT-PCR using primers Bb_cort_exon1A_F or Bb_cort_exon1B_F and Bb_cort 
exon9_R (for primer sequences see Supplementary Table 4). PCR conditions 
were 60°C annealing, 40 cycles, 75s extension, 25 il total volume, 3 jl wing disc 
cDNA, LongAmp Taq DNA polymerase (NEB). A Fragment Analyzer (Advanced 
Analytical) was used to estimate the size and relative abundance of amplicons 
within each individual, after normalizing samples to a concentration range of 
~1-10ng ,l-!. The concentration of each fragment peak was calculated using 
PROSize (Advanced Analytical), and the relative abundance was computed as the 
concentration of a splice variant divided by the sum of all fragment concentrations 
within that individual profile. The cortex splice variant amplicons were sequenced 
as two pools (t/t and c/t) using Pacific Biosystems RS II with P6-C4 chemistry and 
the insert reads extracted using smrtportal (Pacific Biosystems). Reads that con- 
tained exon 1A or 1B and exon 9 were used to validate the sequence composition 
and relative abundance of spliced gene isoforms. 

No part of carb-TE was detected in cortex transcripts, either with PacBio 
sequencing or with PCR using various primer combinations where one primer 
lies within the transposon and the other matches a cortex exon. However, a carb- 
TE-like partial sequence was amplified (with primers within repeat units) from 
both typica and carbonaria morph cDNA synthesized using carb-TE primers, 
implying that these RNA sequences are transcribed from non-allelic homologues 
of the carb-TE. 

Expression of alternative candidate genes. Two B. mori adult melanism/pattern- 
ing mutations, Black moth (Bm) and Wild wing spot (Ws), were recently mapped to 
a region partially orthologous to the carbonaria interval". In this study, end-point 
PCR showed complete absence of cortex expression in pupal stages and adults but 
potentially important prepupal stages were not examined. Three neighbouring 
genes (BGIBMGA005658, BGIBMGA005657 and BGIBMGA005655) did show 
convincing differences between the wild type and both mutants even though these 
genes lie outside the Ws mapping interval. We performed equivalent end-point 
RT-PCR for the three orthologues in B. betularia to determine whether morph- 
gene expression associations existed between carbonaria and typica (comparing 
c/t and t/t genotypes for wing disc stages Cr4, Cr6, Pu2, Pu4 and PDP). PCR 
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conditions were as for cortex 1A/1B-9 end-point PCRs, except for 45-s extension 
(for primer sequences, see Supplementary Table 4). 

Cortex phylogeny and protein modelling. Cortex sequences derived from data- 
base searches (Supplementary Table 5) were supplemented with a selection of 
chd1 and cdc20/ fzy sequences from model organisms and the set aligned with 
MAFFT* (Supplementary Data 2). The central propeller domain was isolated and 
used for bootstrapped phylogenetic analysis with MEGA 6 (ref. 36) employing its 
Maximum Likelihood algorithm and the JTT matrix-based model. Any gapped 
positions were ignored. Homology models of B. betularia and D. melanogaster 
cortex proteins were made with MODELLER” and Consurf*® was used to map 
protein sequence conservation to their respective surfaces among lepidopteran or 
non-lepidopteran cortex proteins. 
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Extended Data Figure 1 | BAC and fosmid haplotype tilepaths used 
to define carbonaria candidate polymorphisms. a, BAC and fosmid 
tilepaths of the carbonaria haplotype (black bars) and three typica 
haplotypes (different shades of grey). Two small regions not covered by 
BACs or fosmids were reconstructed using parent and offspring sequences 
from the same heterozygous family (FAM11). The positions of loci b and d 
(see Figure 1) are indicated by the dashed lines, and the carbonaria 
candidate region is highlighted blue. Fosmid 25H14 containing carb-TE 
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appears small because it is aligned against the typica reference sequence, 
which does not include the carb-TE. b, Alignment of three typica 
haplotypes against the carbonaria haplotype for a short section within the 
carbonaria candidate region, showing SNPs (dots are nucleotides identical 
to the carbonaria sequence). Polymorphisms in which all three typica 
alleles differed from carbonaria were treated as carbonaria candidates; 
polymorphisms in which the same allele occurred in carbonaria and at 
least one typica were excluded from further consideration. 
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Extended Data Figure 2 | Validation of the 3-primer PCR carb-TE 
genotyping assay in a family and its application in a variety of wild- 
caught moths. a, Schematic alignment of carbonaria and typica haplotypes 
showing the position of the three primers (A, B and C, not to scale) used 
in the same PCR to detect the presence and absence of the 22 kb carb-TE. 
In the presence of the carb-TE, primers A and C are too far apart to 
generate a product; the repeat structure of the carb-TE presents three 
annealing sites for primer B but only the shortest primer B-C combination 
is amplified when using 45-s extension (primer sequences are listed in 
Supplementary Table 1). b, carb-TE genotypes for father (lane 2), mother 
(lane 3) and 15 offspring (lanes 4-18); the two brightest bands in the size 
ladder are 300 bp and 1 kb (lane 1). The parents were full siblings and 


c/c c/t t/t ins ins ins ins 


light intermediate dark carb-like 


known to be heterozygous (c/t), and therefore expected to generate c/c, c/t 
and t/t offspring. The larger band (primers B-C) indicates the presence of 
the carb-TE and the smaller band (primers A-C) its absence (typica allele 
in this family); heterozygotes have both bands. The individual in lane 15 
(135F1-12) is the homozygous male used for whole genome sequencing. 
c, Presence or absence of the carb-TE in a carbonaria haplotype fosmid 
clone (lane 2), three different typica haplotype clones (lanes 3-5; one 
fosmid, two BACs), wild carbonaria homozygotes (lanes 6 and 7), wild 
carbonaria heterozygotes (lanes 8-10), typica with a flanking haplotype 
similar to the carbonaria haplotype but lacking the carb-TE (lanes 11-13), 
light insularia (lanes 14-16), intermediate insularia (lanes 17-19), dark 
insularia (lanes 20-22) and carbonaria-like insularia (lanes 23-25). 
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Mobile DNA element with terminal 
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Transposase dimer — _ transposable 
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Transposase dimer targets insertion site 
in a typica haplotype 


Target site cut leaving single-strand 
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GGAG complementary overhangs 
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| stranded flanks 
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terminal repeats adjacent to the TIRs 
Extended Data Figure 3 | Hypothetical reconstruction of the birth filled in to complete the target site duplication*’. The unduplicated target 
of the carbonaria allele. Class II non-autonomous DNA transposition site motif (CCTC) is common, possibly ubiquitous, in all non-carbonaria 
is mediated by two transposase monomers linked to terminal inverted (typica and insularia) haplotypes, but a typica ancestor is more likely given 
repeats (TIR). The monomers form a dimer at the target site that is the pattern of haplotype similarities and the presumed prevalence of typica 


cleaved to leave short direct repeated overhangs. The transposable element haplotypes around 1800. 
including TIRs is inserted and finally the single-stranded cleaved sites are 
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Extended Data Figure 4 | The rise and fall of carbonaria in the trajectories; orange dots, additional data collected after 2002 (year during 


Manchester area. a, Frequency of the carbonaria phenotype from ~1800 to — which >85% of the field sample was collected). Stars indicate likely 

2009. b, Corresponding frequencies of the carbonaria allele. The envelopes _ frequencies where historical data are scarce. Data and sources are listed in 
show the confidence intervals (50%, 90% and 99%) for the simulated Supplementary Table 3. 

trajectories. Dark-red dots, observations falling within the simulated 
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Extended Data Figure 5 | Position and tissue-specific expression of ++-+(strong PCR product). Negative PCRs represent expression below 
alternative first exons of the gene cortex. a, Illustration of cortex exon the detection threshold; this may even occur in ‘origin’ tissue types (wing 
structure indicating the positions of thirteen alternative transcription disc/pupa/testes) in which the alternative starts were discovered owing to 
starts and subsequent exons relative to the flanking genes in the b-d the fact that 5’ RACE used ~20 times the amount of RNA template relative 
region (position of carb-TE indicated by orange bar). b, Expression of to the standard cDNA synthesis for the 35 cycle end-point PCRs. Ovaries 
different starting position cortex transcripts. End-point RT-PCR with were not used for 5’ RACE, which may have caused gonad expression bias 
reduced cycles (35) was used to exclude transcripts with negligible towards testes. Test tissues are sixth instar larvae gonads and wing discs at 
dosage. Amplicon intensities are scaled between + (faint but visible) and different developmental stages (abbreviations as in Fig. 3). 
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Extended Data Figure 6 | Examples of cortex splice variation pattern in in the size ladder are 300 bp and 1 kb) and carbonaria individuals (all c/t 
typica and carbonaria developing wing discs. End-point PCR on wing heterozygotes) to the right of the central ladder. a, Exon 1A variants in Cr2 
disc cDNA amplified with primers in the first and last exons (E1-E9), with _ stage. b, Exon 1B variants in Cr4 stage. (See Fig. 3 for stage abbreviations.) 
typica individuals to the left of the central ladder (the two brightest bands 
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Extended Data Figure 7 | Exonic structure and size distributions of 
cortex splice variants amplified by end-point RT-PCR with primers 
in exon 1A or 1B and exon 9. Size distributions of the PacBio reads are 
displayed for the two alternative first exons 1A (a) and 1B (b) of cortex. 

c, d, Comparison of carbonaria locus genotypes (t/t pale blue fill, c/t light 
blue line, c/c dark blue line) measured with Fragment Analyzer. Relative 
fluorescence units (RFU) were averaged across individuals for fragments 
amplified with E1A-E9 (c) or E1B-E9 (d) primers. Prior to averaging, 
RFUs were standardized so that the total fluorescence (area under the 
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curve) per individual scaled to 1. Arrows with the same numbers denote 
either similar exonic structure (E1A versus E1B variants) or fragment 
identity between the two sources of data (PacBio reads and Fragment 
Analyzer). Exonic structure of the six main splice variants is represented 
in matrices (a, b), in which white cells represent skipped exons in a splice 
variant (asterisk indicates full transcript in which the first 71 bp of exon 6 
are missing). Apparent differences among melanic and non-melanic for 
1A number 2 and number 3 splice variants were not consistent among 
families. 
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Extended Data Figure 8 | Tukey plots for relative expression of cortex (P= 0.001, GLM), whereas genotypes do not differ for 1A full transcript 
full transcript in developing wing discs. c/t heterozygotes are compared (P>0.5, GLM). Note the differing y-axes scales. c, Sample sizes for cortex 
with t/t homozygotes produced from c/t x t/t crosses (starting with exon qPCR experiments by wing disc developmental stage and carbonaria-locus 


1B (a) or exon 1A (b)). Genotypes differ significantly for 1B fulltranscript —_ genotype. 
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Extended Data Figure 9 | Orthology and functional domain conservation 
of cortex protein. a, Schematic illustration, not to scale, of molecular 
features of B. betularia cortex protein sequence. b, Bootstrapped Maximum 
Likelihood consensus tree calculated with MEGA 6 of fzy/cortex derived 
from the propeller domain of the alignment in Supplementary Data 2. 
Branches are collapsed where partitions were reproduced in less than half 

of bootstrap replicates. Major groups containing lepidopteran cortex (black 
circles), non-lepidopteran cortex (red circles), fzr/rap (yellow circles) or 
fzy/cdc20/cdh1 proteins (green circles) are similarly unequivocally defined in 
trees obtained by neighbour joining or maximum parsimony methods (not 
shown). c, 3D protein sequence conservation mapping of lepidopteran cortex 
sequences onto a homology model of B. betularia cortex (top); all cortex 
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sequences onto the same B. betularia model (middle); non-lepidopteran 
cortex sequences onto a model of D. melanogaster cortex (bottom). Molecular 
surfaces are shown in PyMOL using a spectrum from high (blue) to low (red) 
conservation. The mapping reveals the shared presence of a presumed inter- 
blade D box-like degron-binding site (pink segment is superimposed D box- 
mimicking sequence from the structures of human APC/C (PDB accession 
4ui9)*°). In contrast, there is much weaker conservation of surface regions 
corresponding to facial KEN box or helical specificity determinant sites 
(white and grey ribbons, respectively, from the same structure), suggesting 
that cortex proteins lack these functionalities. Note that the greater sequence 
variability in the non-lepidopteran set leads to lower overall sequence 
conservation (bottom) but that overall patterns in all panels are similar. 
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Extended Data Table 1 | Predicted functionality of B. betularia cortex isoforms (starting with exon 1A or 1B) 


Feature known in Cdh1/Cdc20 (function) and its potential conservation 


Binding to APC/C Binding to degrons (see Ext. Dat. Fig. 9) 
Isoform Length Ge ei Segments 2&4 Le ale Inter-blade recognition Facial recognition of ee teas ny 
(residues) of APC/C) (bind Apc1) (binds Apc3) site for LxExxxN degron KEN-box degron determinant 
1A 441 Vv x Vv Vv x x 
1B 407 8 x Vv Vv x x 
2A 291 * x Vv eT x x 
2B 291 x # Vv xT x x 
3A 323 Vv x Vv x a x 
3B 289 x x Vv x x x 
4A 284 R9 & (4 x x x 
4B 270 x x 4 x x & 
5A 402 x % Vv Vv x x 
5B 270 % x 4 x x x 
6A 291 x x 4 xt x 5 3 
6B 291 x x Vv xt & x 


*lsoforms as defined in Extended Data Fig. 7. 
TAs the region lost from the propeller fold constitutes approximately a single blade, it is possible that these, and only these, truncated-propeller forms may still fold stably. 
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The gene cortex controls mimicry and crypsis in 


butterflies and moths 
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The wing patterns of butterflies and moths (Lepidoptera) are 
diverse and striking examples of evolutionary diversification by 
natural selection’. Lepidopteran wing colour patterns are a key 
innovation, consisting of arrays of coloured scales. We still lack a 
general understanding of how these patterns are controlled and 
whether this control shows any commonality across the 160,000 
moth and 17,000 butterfly species. Here, we use fine-scale mapping 
with population genomics and gene expression analyses to identify 
a gene, cortex, that regulates pattern switches in multiple species 
across the mimetic radiation in Heliconius butterflies. cortex 
belongs to a fast-evolving subfamily of the otherwise highly 
conserved fizzy family of cell-cycle regulators’, suggesting that it 
probably regulates pigmentation patterning by regulating scale 
cell development. In parallel with findings in the peppered moth 
(Biston betularia)*, our results suggest that this mechanism is 
common within Lepidoptera and that cortex has become a major 
target for natural selection acting on colour and pattern variation 
in this group of insects. 

In Heliconius, there is a major effect locus, Yb, that controls a diver- 
sity of colour pattern elements across the genus. It is the only locus 
in Heliconius that regulates all scale types and colours, including the 
diversity of white and yellow pattern elements in the two co-mimics 
H. melpomene and H. erato, and whole-wing variation in black, yellow, 
white, and orange/red elements in H. numata>~’. In addition, genetic 
variation underlying the Bigeye wing pattern mutation in Bicyclus any- 
nana, melanism in the peppered moth, Biston betularia, and mela- 
nism and patterning differences in the silkmoth, Bombyx mori, have all 
been localized to homologous genomic regions*"'° (Fig. 1). Therefore, 
this genomic region appears to contain one or more genes that act 
as major regulators of wing pigmentation and patterning across the 
Lepidoptera. 

Previous mapping of this locus in H. erato, H. melpomene and 
H. numata identified a genomic interval of about 1 Mb (refs 11-13) 
(Extended Data Table 1), which also overlaps with the 1.4-Mb region 
containing the carbonaria locus in B. betularia® and a 100-bp non- 
coding region containing the Ws mutation in B. mori’° (Fig. 1). We 
used a population genomics approach to identify the single nucleotide 
polymorphisms (SNPs) that were most strongly associated with phe- 
notypic variation within the approximately 1-Mb Heliconius interval. 
The diversity of wing patterning in Heliconius arises from divergence 
at wing pattern loci’, while convergent patterns generally involve the 
same loci and sometimes even the same alleles'*"!©. We used this pat- 
tern of divergence and sharing to identify SNPs associated with colour 


pattern elements across many individuals from a wide diversity of 
colour pattern phenotypes (Fig. 2). 

In three separate Heliconius species, our analysis consistently impli- 
cated the gene cortex as being involved in adaptive differences in wing 
colour pattern. In H. erato the strongest associations with the presence 
of a yellow hindwing bar were centred around the genomic region 
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Figure 1 | A homologous genomic region controls a diversity of 
phenotypes across the Lepidoptera. Left, phylogenetic relationships’. 
Right, chromosome maps with colour pattern intervals in grey; coloured 
bars represent markers used to assign homology** '° and the first and last 
genes from Fig. 2 are shown in red. In H. erato the HeCr locus controls the 
yellow hindwing bar phenotype (grey boxed races). In H. melpomene it 
controls both the yellow hindwing bar (HmYb, pink box) and the yellow 
forewing band (HmN, blue box). In H. numata it modulates black, yellow 
and orange elements on both wings (HnP), producing phenotypes that 
mimic butterflies in the genus Melinaea. Morphs/races of Heliconius 
species included in this study are shown with names. All images are by the 
authors or are in the public domain. 
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Figure 2 | Association analyses across the genomic region known to 
contain major colour pattern loci in Heliconius. a, Association in 

H. erato with the yellow hindwing bar (n = 45). Coloured SNPs are fixed 
for a unique state in H. erato demophoon (orange) or H. erato favorinus 
(purple). b, Genes in H. erato with direct homologues in H. melpomene. 
Genes are in different colours with exons (coding and UTRs) connected 
by lines. Grey bars are transposable elements. c, H. melpomene genes and 
transposable elements. Colours correspond to homologous H. erato genes 
and microRNAs” are black. d, Association in the H. melpomene/timareta/ 


containing cortex (Fig. 2a). We identified 108 SNPs that were fixed for 
one allele in H. erato favorinus, and fixed for the alternative allele in 
all individuals lacking the yellow bar; the majority of these SNPs were 
in introns of cortex (Extended Data Table 2). Fifteen SNPs showed a 
similar fixed pattern for H. erato demophoon, which also has a yellow 
bar. These SNPs did not overlap with those in H. erato favorinus, con- 
sistent with the hypothesis that this phenotype evolved independently 
in the two disjunct populations!”. 

Previous work has suggested that alleles at the Yb locus are shared 
between H. melpomene, the closely related species H. timareta, and 
the more distantly related species H. elevatus, resulting in mimicry 
among these species'®. Across these species, the strongest associations 
with the yellow hindwing bar phenotype were again found at cortex 
(Fig. 2d, Extended Data Fig. 1a and Extended Data Table 3). Similarly, 
the strongest associations with the yellow forewing band were found 
around the 5’ untranslated regions (UTRs) of cortex and HM00036, 
an orthologue of the wash gene in Drosophila melanogaster. A single 
SNP about 17 kb upstream of cortex (the closest gene) was perfectly 
associated with the yellow forewing band across all H. melpomene, 
H. timareta and H. elevatus individuals (Extended Data Figs 1a, 2 and 
Extended Data Table 3). We found no fixed coding sequence variants at 
cortex in larger samples (14-38 individuals) of H. melpomene aglaope 
and H. melpomene amaryllis (Extended Data Fig. 3 and Supplementary 
Information), which differ in Yb-controlled phenotypes’’, 
suggesting that functional variants are likely to be regulatory rather 


silvaniform group with the yellow hindwing bar (red) and yellow forewing 
band (blue) (n = 49). e, Association in H. numata with the bicoloratus 
morph (n= 26); inversion positions!’ shown below. In all cases black or 
dark coloured points are above the strongest associations found outside 
the colour pattern scaffolds (H. erato P= 1.63 x 10-5; H. melpomene 
P=2.03 x 10-° and P=2.58 x 10~° for hindwing bar and forewing band, 
respectively; H. numata P= 6.81 x 10~°). P values are from score tests for 
association. 


than coding. We found extensive transposable element variation 
around cortex but it is unclear whether any of these are associated 
with phenotypic differences (Extended Data Fig. 3, Extended Data 
Table 4 and Supplementary Information). 

Finally, large inversions at the P supergene locus in H. numata (Fig. 1) 
are associated with different morphs". There is a steep increase in 
genotype-by-phenotype association at the breakpoint of inversion 1, 
consistent with the role of these inversions in reducing recombination 
(Fig. 2e). However, the bicoloratus morph can recombine with all other 
morphs across one or the other inversion, permitting finer-scale asso- 
ciation mapping of this region. As in H. erato and H. melpomene, this 
analysis showed a narrow region of associated SNPs corresponding 
exactly to the cortex gene (Fig. 2e), again with the majority of SNPs 
being found in introns (Extended Data Table 2). This associated region 
does not correspond to any other known genomic feature, such as an 
inversion or inversion breakpoint. 

To determine whether sequence variants around cortex were reg- 
ulating its expression, we investigated gene expression across the Yb 
locus. We used a custom designed microarray including probes from 
all predicted genes in the H. melpomene genome’ as well as probes 
tiled across the central portion of the Yb locus, focusing on two nat- 
urally hybridizing H. melpomene races (plesseni and malleti) that 
differ in Yb-controlled phenotypes’. cortex was the only gene across 
the entire interval to show significant differences in expression both 
between races with different wing patterns (false discovery rate (FDR) 
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Figure 3 | Differential gene expression across the genomic region 
known to contain major colour pattern loci in H. melpomene. 

a-f, Expression differences in day 3 pupae, for all genes in the Yb interval 
(a, d) and tiling probes spanning the central portion of the interval 

(b, ¢, e, f). Expression is compared between races for each wing region 
(a-c) and between proximal and distal forewing sections for each race 


adjusted t-test P= 6.09 x 10~7) and between wing sections with dif- 
ferent pattern elements (FDR adjusted t-test P = 0.00224; Fig. 3). 
This finding was reinforced in the tiled probe set, where we observed 
strong differences in the expression of cortex exons and introns 
but few differences outside this region (Extended Data Table 2). 
cortex expression was higher in H. melpomene malleti than in 
H. melpomene plesseni in all three wing sections used (but not eyes) 
(Fig. 3c and Extended Data Fig. 4c). When different wing sections 
were compared within each race, cortex expression in H. melpomene 
malleti was higher in the distal section that contains the Yb-controlled 
yellow forewing band than in the proximal section, consistent with 
cortex producing this band. In contrast, H. melpomene plesseni, which 
lacks the yellow band, had higher cortex expression in the proximal 
forewing section than in the distal section (Fig. 3f and Extended Data 
Fig. 4j). Differences in expression were found in pupal wings only on 
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(d-f). c, f, Magnitude and direction of expression difference (log, fold 
change) for tiling probes showing significant differences (P < 0.05); 
probes in known cortex exons shown in dark colours. Gene HM00052 
was differentially expressed between other races in RNA-seq data 
(Supplementary Information) but is not differentially expressed here. 
P values are based on FDR-adjusted t-statistics. 


days 1 and 3 but not on days 5 or 7 (Extended Data Fig. 4), similar to 
the pattern observed previously for the transcription factor optix”®. 
Differential expression was not confined to the exons of cortex; 
the majority of differentially expressed probes in the tiling array cor- 
responded to cortex introns (Fig. 3). This differential expression of 
introns does not appear to be due to transposable element variation 
(Extended Data Table 2), but may be due to elevated background tran- 
scription and unidentified splice variants. PCR with reverse transcrip- 
tion (RT-PCR) revealed a diversity of splice variants (Extended Data 
Fig. 5), and their sequenced products included eight non-constitutive 
exons and six variable donor/acceptor sites, but we did not exhaus- 
tively sequence all transcripts (Supplementary Information). We can- 
not rule out the possibility that some of the differentially expressed 
intronic regions could be distinct non-coding RNAs. However, 
quantitative RT-PCR (qRT-PCR) in other hybridizing races with 
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Figure 4 | In situ hybridizations of cortex in hindwings of final instar 
larvae. a, b, H. numata tarapotensis (replicated three times in the lab). 
Adult wing shown in a; coloured points indicate landmarks, yellow arrows 
highlight adult pattern elements corresponding to cortex staining. 

c, d, H. melpomene rosina (replicated twice in the lab). Adult wing shown 
in c; staining patterns in other H. melpomene races (meriana, n= 11, and 
aglaope, n = 6) appeared similar. The probe used was complementary to 
the cortex isoform with the longest open reading frame (also the most 
common; see Extended Data Fig. 5). 


divergent Yb alleles (aglaope/amaryllis and rosina/melpomene) also 
identified differences in cortex expression and allele-specific splicing 
differences between both pairs of races (Extended Data Figs 1, 5 and 
Supplementary Information). 

Finally, in situ hybridization of cortex in final instar larval hindwing 
discs showed expression in wing regions fated to become black in the 
adult wing, most strikingly in their correspondence to the black pat- 
terns on adult H. numata wings (Fig. 4). In contrast, the array results 
from pupal wings were suggestive of higher expression in non-melanic 
regions. This may suggest that cortex is upregulated at different time- 
points in wing regions fated to become different colours. 

Overall, cortex shows significant differential expression and is the 
only gene in the candidate region to be consistently differentially 
expressed in multiple race comparisons and between differently pat- 
terned wing regions. Coupled with the strong genotype-by-phenotype 
associations across multiple independent lineages (Extended Data 
Table 1), these findings strongly implicate cortex as a major regulator 
of colour and pattern. However, we have not excluded the possibil- 
ity that other genes in this region also influence pigmentation pat- 
terning. A prominent role for cortex is also supported by studies in 
other taxa; our identification of distant 5’ untranslated exons of cortex 
(Supplementary Information) suggests that the 100-bp interval con- 
taining the Ws mutation in B. mori is likely to be within an intron of 
cortex and not in intergenic space, as previously thought”. In addition, 
fine mapping and gene expression also suggest that cortex controls 
melanism in the peppered moth‘. 

It seems likely that cortex controls pigmentation patterning by 
controlling scale cell development. The cortex gene falls in an insect- 
specific lineage within the fzy (also known as Cdc20/fizzy) family of 
cell-cycle regulators (Extended Data Fig. 6a). The phylogenetic tree 
of this gene family highlighted three major orthologous groups, two 
of which have highly conserved functions in cell-cycle regulation, 
mediated through interaction with the anaphase-promoting complex/ 
cyclosome (APC/C)*!. The third group, containing cortex proteins, is 
evolving rapidly, with low amino acid identity between D. melanogaster 
and H. melpomene cortex (14.1%), contrasting with much higher 
identities for orthologues between these species in the other two 
groups (fzy, 47.8% and rap (also known as fzr, cdh1, rap/Fzr), 47.2%; 
Extended Data Fig. 6a). D. melanogaster cortex acts through a similar 
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mechanism to fzy to control meiosis in the female germ line’?”**. 


H. melpomene cortex also has some conservation of the fizzy family 
C-box and IR (isoleucine-arginine) tail elements (Supplementary 
Information) that mediate binding to the APC/C”’, suggesting that 
it may have retained a cell-cycle function, although we found that 
expressing H. melpomene cortex in D. melanogaster wings produced 
no detectable effect (Extended Data Fig. 6 and Supplementary 
Information). 

Previously identified butterfly wing patterning genes have been 
transcription factors or signalling molecules”°*°, Developmental rate 
has long been thought to play a role in lepidopteran patterning”*”’, 
but cortex was not a likely a priori candidate, because its Drosophila 
orthologue has a highly specific function in meiosis**. The recruit- 
ment of cortex to wing patterning appears to have occurred before the 
major diversification of the Lepidoptera and this gene has repeatedly 
been targeted by natural selection’”””* to generate both cryptic* and 
aposematic patterns. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

H. erato Cr reference. Cr is the homologue of Yb in H. erato (Fig. 1). An existing 
reference for this region was available in three pieces*! (467,734 bp, 114,741 bp 
and 161,149 bp; GenBank KC469893.1). We screened the same bacterial artifi- 
cial chromosome (BAC) library used previously!!! using described procedures!! 
with probes designed to the ends of the existing BAC sequences and the HmYb 
BAC reference sequence. Two BACs (04B01 and 10B14) were identified as span- 
ning one of the gaps and sequenced using Illumina 2 x 250-bp paired-end reads 
collected on the Illumina MiSeq. The raw reads were screened to remove vector 
and Escherichia coli bases. The first 50,000 read pairs were taken for each BAC 
and assembled individually with the Phrap*” software and manually edited with 
consed**, Contigs with discordant read pairs were manually broken and properly 
merged using concordant read data. Gaps between contig ends were filled using 
an in-house finishing technique in which the terminal 200 bp of the contig ends 
were extracted and queried against the unused read data for spanning pairs, which 
were added using the addSolexaReads.perl script in the consed package. Finally, 
a single reference contig was generated by identifying and merging overlapping 
regions of the two consensus BAC sequences. 

To fill the remaining gap (between positions 800,387 and 848,446) we used 
the overhanging ends to search the scaffolds from a preliminary H. erato genome 
assembly of five Illumina paired-end libraries with different insert sizes (250, 500, 
800, 4,300 and 6,500 bp) from two related H. erato demophoon individuals. We 
identified two scaffolds (scf1869 and scf1510) that overlapped and spanned the 
gap (using 12,257 bp of the first scaffold and 35,803 bp of the second). 

The final contig was 1,009,595 bp in length, of which 2,281 bp were unknown 

(N). The HeCr assembly was verified by aligning to the HmYb genome scaffold 
(HE667780) with mummer and blast. The HeCr contig was annotated as described 
previously*! with some minor modifications. Briefly this involved first generating 
a reference-based transcriptome assembly with existing H. erato RNA-sequenced 
(RNA-seq) wing tissue (GenBank accession SRA060220). We used Trimmomatic*4 
(v0.22), and FLASh*? (v1.2.2) to prepare the raw sequencing reads, checking the 
quality with FastQC* (v0.10.0). We then used the Bowtie/TopHat/Cufflinks*”-” 
pipeline to generate transcripts for the unmasked reference sequence. We generated 
gene predictions with the MAKER pipeline” (v2.31). Homology and synteny in 
gene content with the H. melpomene Yb reference were identified by aligning the 
H. melpomene coding sequences to the H. erato reference with BLAST. Homologous 
genes were present in the same order and orientation in H. erato and H. melpomene 
(Fig. 2b, c). Annotations were manually adjusted if genes had clearly been merged 
or split in comparison to H. melpomene (which has been extensively manually 
curated!?), In addition, H. erato cortex was manually curated from the RNA-seq 
data and using Exonerate"! alignments of the H. melpomene protein and mRNA 
transcripts, including the 5’ UTRs. 
Genotype-by-phenotype association analyses. Information on the individuals 
used and ENA accessions for sequence data are given in Supplementary Table 1. 
We used shotgun Illumina sequence reads from 45 H. erato individuals from 7 races 
that were generated as part of a previous study*! (Supplementary Information). 
Reads were aligned to an H. erato reference containing the Cr contig and other 
sequenced H. erato BACs!!! with BWA”, which has previously been found to 
work better than Stampy*? (which was used for the alignments in the other species) 
with an incomplete reference sequence*'. The parameters used were as follows: 
maximum edit distance (n), 8; maximum number of gap opens (0), 2; maximum 
number of gap extensions (e), 3; seed (1), 35; maximum edit distance in seed (k), 2. 
We then used Picard tools to remove PCR and optical duplicate sequence reads and 
GATK* to re-align indels and call SNPs using all individuals as a single population. 
Expected heterozygosity was set to 0.2 in GATK. 132,397 SNPs were present across 
Cr. A further 52,698 SNPs not linked to colour pattern loci were used to establish 
background association levels. 

For the H. melpomene/H. numata clade we used previously published sequence 
data from 19 individuals from enrichment sequencing targeting the Yb region, 
the unlinked HmB/D region that controls the presence or absence of red col- 
our pattern elements, and ~1.8 Mb of non-colour pattern genomic regions", as 
well as 9 whole-genome shotgun-sequenced individuals'*“°. We added targeted 
sequencing and shotgun whole-genome sequencing of an additional 47 individuals 
(Supplementary Information). Alignments were performed using Stampy* with 
default parameters except for substitution rate which was set to 0.01. We again 
removed duplicates and used GATK to re-align indels and call SNPs with expected 
heterozygosity set to 0.1. 

The analysis of H. melpomene/timareta/silvaniform included 49 individuals, 
which were aligned to v1.1 of the H. melpomene reference genome with the scaf- 
folds containing Yb and HmB/D swapped with reference BAC sequences'®, which 
contained fewer gaps of unknown sequence than the genome scaffolds. The Yb 
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region contained 232,631 SNPs and a further 370,079 SNPs were used to establish 
background association levels. 

The H. numata analysis included 26 individuals aligned to unaltered v1.1 of 
the H. melpomene reference genome, because the genome scaffold containing Yb 
is longer than the BAC reference, making it easier to compare the inverted and 
non-inverted regions in this species. We tested for associations at 262,137 SNPs 
on the Yb scaffold with the H. numata bicoloratus morph, which had a sample size 
of 5 individuals. 

We measured associations between genotype and phenotype using a score test 

(qtscore) in the GenABEL package in R (ref. 47). This was corrected for back- 
ground population structure using a test specific inflation factor (A) calculated 
from the SNPs unlinked to the major colour pattern controlling loci (described 
above), as the colour pattern loci are known to have a different population structure 
from the rest of the genome!*!>!8, We used a custom perl script to convert GATK 
vef files to Illumina SNP format for input to GenABEL*”. GenABEL does not 
accept multiallelic sites, so the script also converted the genotype of any individuals 
for which a third (or fourth) allele was present to a missing genotype (with these 
defined as the lowest frequency alleles). Custom R scripts were used to identify sites 
showing perfect associations with calls for >75% of individuals. 
Microarray gene expression analyses. We designed a Roche NimbleGen microar- 
ray (12 x 135K format) with probes for all annotated H. melpomene genes!* and 
tiling of the central portion of the Yb BAC sequence contig that was previously 
identified as showing the strongest differentiation between H. melpomene races”. 
In addition to the HmYb tiling array probes there were 6,560 probes tiling HmAc 
(a third unlinked colour pattern locus) and 10,716 probes tiling HmB/D, again 
distanced on average at 10-bp intervals. The whole-genome gene expression array 
contained 107,898 probes in total. 

This array was interrogated with Cy3-labelled double-stranded cDNA gener- 
ated from total RNA (with a SuperScript double-stranded cDNA synthesis kit 
(Invitrogen) and a one-colour DNA labelling kit (Niblegen)) from four pupal devel- 
opmental stages of H. melpomene plesseni and malleti. Pupae were from captive 
stocks maintained in insectary facilities in Gamboa, Panama. Tissue was stored in 
RNA later (Ambion) at —80°C before RNA extraction. RNA was extracted using 
TRIzol (Invitrogen) followed by purification with RNeasy (Qiagen) and DNase 
treated with DNA-free (Ambion). Quantification was performed using a Qubit 
2.0 fluorometer (Invitrogen) and purity and integrity assessed using a Bioanalyzer 
2100 (Agilent). Samples were randomized and each hybridized to a separate array. 
The HmYb probe array contained 9,979 probes distanced on average at 10 bp. The 
whole-genome expression array contained on average 9 probes per annotated gene 
in the genome (v.1.1 (ref. 18)) as well as any transcripts not annotated but predicted 
from RNA-seq evidence. 

Background corrected expression values for each probe were extracted using 
NimbleScan software (v.2.3). Analyses were performed with the LIMMA package 
implemented in R/Bioconductor™. The tiling array and whole-genome data sets 
were analysed separately. Expression values were extracted and quantile-normal- 
ized, logy-transformed, quality controlled and analysed for differences in expres- 
sion between individuals and wing regions. P values were adjusted for multiple 
hypothesis testing using the false discovery rate (FDR) method”. 

In situ hybridization. H. numata and H. melpomene larvae were reared ina 
greenhouse at 25-30°C and sampled at the last instar. In situ hybridizations were 
performed according to previously described methods”? with a cortex riboprobe 
synthesized from a 831-bp cDNA amplicon from H. numata. Wing discs were incu- 
bated in a standard hybridization buffer containing the probe for 20-24h at 60°C. 
For secondary detection of the probe, wing discs were incubated in a 1:3,000 dilu- 
tion of anti-digoxigenin alkaline phosphatase Fab fragments and stained with BM 
Purple for 3-6h at room temperature. Stained wing discs were photographed with 
a Leica DFC420 digital camera mounted on a Leica Z6 APO stereomicroscope. 
De novo assembly of short read data in H. melpomene and related taxa. To 
better characterize indel variation from the short-read sequence data used for the 
genotype-by-phenotype association analysis, we performed de novo assemblies of a 
subset of H. melpomene individuals and related taxa with a diversity of phenotypes 
(Extended Data Fig. 2). Assemblies were performed using the de novo assem- 
bly function of CLCGenomics Workbench v.6.0 under default parameters. The 
assembled contigs were then BLASTed against the Yb region of the H. melpomene 
melpomene genome’’, using Geneious v.8.0. The contigs identified by BLAST were 
then concatenated to generate an allele sequence for each individual. Occasionally 
two unphased alleles were generated when two contigs were matched to a given 
region. If more than two contigs of equal length matched then this was considered 
an unresolvable repeat region and replaced with Ns. The assembled alleles were 
then aligned using the MAFFT alignment plugin in Geneious v.8.0. 

Long-range PCR targeted sequencing of cortex in H. melpomene aglaope and 
H. melpomene amaryllis. We generated two long-range PCR products covering 
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88.8% of the 1,344-bp coding region of cortex (excluding 67 bp at the 5’ end and 
83bp at the 3’ end; see Supplementary Information). A product spanning coding 
exons 5-9 (the final exon) was obtained from 29 H. melpomene amaryllis indi- 
viduals and 29 H. melpomene aglaope individuals; a product spanning coding 
exons 2-5 was obtained from 32 H. melpomene amaryllis individuals and 14 
H. melpomene aglaope individuals. In addition, a product spanning exons 4-6 was 
obtained from six H. melpomene amaryllis and five H. melpomene aglaope indi- 
viduals that failed to amplify one or both of the larger products. Long-range PCR 
was performed using Extensor long-range PCR mastermix (Thermo Scientific) 
following the manufacturer’s guidelines with a 60°C annealing temperature in 
a 10-20-11 volume. The product spanning coding exons 5-9 was obtained with 
primers HM25_long_F1 and HM25_long_R4 (see Supplementary Table 2 for 
primer sequences); the product spanning coding exons 2-5 was obtained with 
primers HM25_long_F4 and HM25_long_R2; the product spanning exons 4-6 
was obtained with primers 25_ex5-ex7_rl and 25_ex5-ex7_fl. Products were 
pooled for each individual, including five additional products from the Yb locus 
and seven products in the region of the HmB/D locus. They were then cleaned 
using QIAquick PCR purification kit (QIAgen) before being quantified with a 
Qubit Fluorometer (Life Technologies) and pooled in equimolar amounts for 
each individual, taking into account variation in the length and number of PCR 
products included for each individual (because of some PCR failures, that is, 
proportionally less DNA was included if some PCR products were absent for a 
given individual). 

Products were pooled within individuals (including additional products for other 
genes not analysed here) and then quantified and pooled in equimolar amounts 
for each individual within each race. The pooled products for each race (H. mel- 
pomene aglaope and amaryllis) were then prepared as two separate libraries with 
molecular identifiers and sequenced on a single lane of an IIlumina GAIIx. Analysis 
was performed using Galaxy and the history is available at https://usegalaxy.org/u/ 
njnadeau/h/long-pcr-final. Reads were quality filtered with a minimum quality of 
20 required over 90% of the read, which resulted in 5% of reads being discarded. 
Reads were then quality trimmed to remove bases with quality less than 20 from 
the ends. They were then aligned to the target regions using the fosmid sequences 
from known races** with sequence from the Yb BAC walk” used to fill any gaps. 
Alignments were performed with BWA v.0.5.6 (ref. 42) and converted to pileup 
format using Samtools v.0.1.12 before being filtered on the basis of quality (>20) 
and coverage (>10). BWA alignment parameters were as follows: fraction of missing 
alignments given 2% uniform base error rate (aln -n) 0.01; maximum number of 
gap opens (aln -o) 2; maximum number of gap extensions (aln -e) 12; disallow long 
deletion within 12 bp towards the 3’-end (aln -d); number of first subsequences to 
take as seed (aln -1) 100. We then calculated coverage and minor allele frequencies 
for each race and the difference between these using custom scripts in R°°, 
Sequencing and analysis of H. melpomene fosmid clones. Fosmid libraries had 
previously been made from single individuals of three H. melpomene races (rosina, 
amaryllis and aglaope) and several clones overlapping the Yb interval had been 
sequenced*®. We extended the sequencing of this region, particularly the region 
overlapping cortex, by sequencing an additional four clones from H. melpomene 
rosina (1051_83D21, accession KU514430; 1051_97A3, accession KU514431; 
1051_65N6, accession KU514432; 1051_93D23, accession KU514433), two clones 
from H. melpomene amaryllis (1051_13K4, accession KU514434; 1049_8P23, 
accession KU514435) and three clones from H. melpomene aglaope (1048_80B22, 
accession KU514437; 1049_19P15, accession KU514436; 1048_96A7, accession 
KU514438). These were sequenced on a MiSeq 2000, and assembled using the 
de novo assembly function of CLCGenomcs Workbench v.6.0. The individual 
clones (including existing clones 1051-143B3, accession FP578990; 1049-27G11, 
accession FP700055; and 1048-62H20, accession FP565804) were then aligned to 
the BAC and genome scaffold’ references using the MAFFT alignment plugin of 
Geneious v.8.0. Regions of general sequence similarity were identified and visu- 
alized using MAUVE". We merged overlapping clones from the same individual 
if they showed no sequence differences, indicating that they came from the same 
allele. We identified transposable elements using nBLAST with an insect transpos- 
able element list downloaded from Repbase Update, including known Heliconius- 
specific transposable elements”’. 

5’ RACE, RT-PCR and qRT-PCR. All tissues used for gene expression analyses 
were dissected from individuals from captive stocks derived from wild-caught 
individuals of various races of H. melpomene (aglaope, amaryllis, melpomene, 
rosina, plesseni and malleti) and F, individuals from a H. melpomene rosina 
(female) x H. melpomene melpomene (male) cross. Experimental individuals 
were reared at 28-31 °C. Developing wings were dissected and stored in RNAlater 
(Ambion Life Technologies). RNA was extracted using a QIAgen RNeasy Mini 
kit following the manufacturer’s guidelines and treated with TURBO DNA-free 
DNase kit (Ambion Life Technologies) to remove remaining genomic DNA. RNA 


quantification was performed with a Nanodrop spectrophotometer, and the RNA 
integrity was assessed using the Bioanalyzer 2100 system (Agilent). 

Total RNA was thoroughly checked for DNA contamination by performing 
PCR for EF 1a (using primers efl-a_RT_for and efl-a_RT_rev, Supplementary 
Table 2) with 0.511 of RNA extract (50ng-1 1g of RNA) in a 20-11 reaction using a 
polymerase enzyme that is not functional with RNA template (BioScript, Bioline 
Reagents Ltd). If a product amplified within 45 cycles then the RNA sample was 
re-treated with DNase. 

Single-stranded cDNA was synthesized using BioScript MMLV Reverse 
Transcriptase (Bioline Reagents Ltd) with random hexamer (N6) primers and 
1g of template RNA from each sample in a 20-1] reaction volume following the 
manufacturer’s protocol. The resulting cDNA samples were then diluted 1:1 with 
nuclease-free water and stored at —80°C. 

5’ RACE (rapid amplification of cDNA ends) was performed using RNA 
from hindwing discs from one H. melpomene aglaope and one H. melpomene 
amaryllis final instar larvae with a SMARTer RACE kit from Clonetech. The 
gene-specific primer used for the first round of amplification was anchored in 
exon 4 (fzl_raceex5_R1; Supplementary Table 2). Secondary PCR of these products 
was then performed using a primer in exon 2 (HM25_long_F2; Supplementary 
Table 2) and the nested universal primer A. Other isoforms were detected by 
RT-PCR using primers within exons 2 and 9 (gene25_for_fulll and gene25_rev_ 
ex3). We identified isoforms from 5’ RACE and RT-PCR products by cutting indi- 
vidual bands from agarose gels and if necessary by cloning products before Sanger 
sequencing. Cloning of products was performed using TOPO TA (Invitrogen) or 
pGEM-T (Promega) cloning kits. Sanger sequencing was performed using BigDye 
terminator v3.1 (Applied Biosystems) run on an ABI13730 capillary sequencer. 
Primers fzl_exla_F1 and fzl_ex4_R1 were used to confirm expression of the fur- 
thest 5’ UTR. For isoforms that appeared to show some degree of race specific- 
ity, we designed isoform-specific PCR primers spanning specific exon junctions 
(Extended Data Figs 2, 4 and Supplementary Table 2) and used these to either 
qualitatively (RT-PCR) or quantitatively (qRT-PCR) assess differences in expres- 
sion between races. 

We performed qRT-PCR using SensiMix SYBR green (Bioline Reagents Ltd) 
with 0.2-0.25 1M of each primer and 1 jl of the diluted product from the cDNA 
reactions. Reactions were performed in an Opticon 2 DNA engine (MJ Research) 
with the following cycling parameters: 95°C for 10 min; 35-50 x (95°C for 15s, 
55-60°C for 30s, 72°C for 30s); 72°C for 5 min. Melting curves were generated 
between 55°C and 90°C with readings taken every 0.2°C for each of the products 
to check that a single product was generated. At least one product from each set 
of primers was also run on a 1% agarose gel to check that a single product of the 
expected size was produced and the identity of the product was confirmed by direct 
sequencing (see Supplementary Table 2 for details of primers for each gene). We 
used two housekeeping genes (EF1A and RPS3A) for normalization and all results 
were taken as averages of triplicate PCR reactions for each sample. 

C, values were defined as the point at which fluorescence crossed a threshold 

(Rc) adjusted manually to be the point at which fluorescence rose above the back- 
ground level. Amplification efficiencies (E) were calculated using a dilution series 
of clean PCR products. Starting fluorescence, which is proportional to the starting 
template quantity, was calculated as Ro = Rq(1 + E)~“. Normalized values were 
then obtained by dividing Ro values for the target loci by Ro values for EFla and 
RPS3A. Results from both of these controls were always very similar, so the results 
presented are normalized to the mean of EF1A and RPS3A. All results were taken 
as averages of triplicate PCR reactions. If one of the triplicate values was more 
than one cycle away from the mean then this replicate was excluded. Similarly any 
individuals that were more than two s.d. away from the mean of all individuals 
for the target or normalization genes were excluded (these are not included in the 
numbers of individuals reported). Statistical significance was assessed by Wilcoxon 
rank sum tests performed in R (ref. 50). 
RNA-seq analysis of H. melpomene amaryllis/aglaope. RNA-seq data for hind- 
wings from three developmental stages had previously been obtained for two 
individuals of each race at each stage (12 individuals in total) and used in the anno- 
tation of the H. melpomene genome'® (deposited in ENA under study accessions 
ERP000993 and PRJEB7951). Four samples were multiplexed on each sequencing 
lane with the fifth instar larval and day 2 pupal samples sequenced on a GAIIx 
sequencer and the day 3 pupal wings sequenced on a Hiseq 2000 sequencer. 

Two methods were used for alignment of reads to the reference genome 
and inferring read counts: Stampy*? and RSEM (RNaseq by Expectation 
Maximisation)™. In addition we used two different R/Bioconductor packages for 
estimation of differential gene expression: DESeq* and BaySeq>®. Read bases with 
quality scores <20 were trimmed with FASTX-Toolkit (http://hannonlab.cshl.edu/ 
fastx_toolkit/index.html). Stampy was run with default parameters except for mean 
insert size, which was set to 500, s.d. 100, and substitution rate, which was set to 
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0.01. Alignments were filtered to exclude reads with mapping quality <30 and 
sorted using Samtools°”. We used the HT seq-count script with HTseq** to infer 
counts per gene from the BAM files. 

RSEM* was run with default parameters to infer a transcriptome and then map 
RNA-seq reads against this using Bowtie*’ as an aligner. This was run with default 
parameters except for the maximum number of mismatches, which was set to 3. 
Annotation and alignment of fizzy family proteins. In the arthropod genomes, 
some fizzy family proteins were found to be poorly annotated based on alignments 
to other family members. In these cases annotations were improved using well-an- 
notated proteins from other species as references in the program Exonerate*! and 
the outputs were manually curated. Specifically, the annotation of B. mori rap 
(also known as fzr) was extended based on alignment of Danaus plexippus rap; 
the annotation of B. mori fzy was altered based on alignment of D. melanogaster 
and D. plexippus fzy; H. melpomene fzy was identified as part of the annotated 
gene HMEL017486 on scaffold HE671623 (Hmel v.1.1) based on alignment of D. 
plexippus fzy; the Apis mellifera rap annotation was altered based on alignment of 
D. melanogaster rap; the annotation of Acyrthosiphon pisum rap was altered based 
on alignment of D. melanogaster rap. No one-to-one orthologues of D. melano- 
gaster fzr2 were found in any of the other arthropod genera, suggesting that this 
gene is Drosophila-specific. Multiple sequence alignment of all the fizzy family 
proteins was then performed using the Expresso server within T-coffee®, and 
this alignment was used to generate a neighbour joining tree in Geneious v.8.1.7. 
Expression of H. melpomene cortex in D. melanogaster wings. D. melanogaster 
cortex is known to generate an irregular microchaete phenotype when ectopically 
expressed in the posterior compartment of the adult fly wing’’. We performed the 
same assay using H. melpomene cortex to test whether this functionality was con- 
served. Following the methods of Swan and Schiipbach”,, we created an upstream 
activating sequence (UAS)-GAL4 construct using the coding region for the long 
isoform of H. melpomene cortex, plus a Drosophila cortex version to act as positive 
control. The haemagglutinin (HA)-tagged H. melpomene UAS-cortex expression 
construct was generated using cDNA reverse transcribed (Revert-Aid, Thermo- 
Scientific) from RNA extracted (Qiagen RNeasy) from pre-ommochrome pupal 
wing material. An HA-tagged D. melanogaster UAS-cortex version was also con- 
structed“. Expression was driven by the hsp70 promoter. Constructs were injected 
into C31-attP40 flies (25709, Bloomington Stock Centre; Cambridge University 
Genetics Department, UK, fly injection service) by site-directed insertion into 
CII via an attB site in the construct. Homozygous transgenic flies were crossed 
with w,y/;en-GAL4;UAS-GFP flies (gift from M. Landgraf laboratory, Cambridge 
University Zoology Department) to drive expression in the engrailed posterior 
domain of the wing, and adult offspring wings were photographed (Extended Data 
Fig. 6b-d). Expression of the construct was confirmed by immunohistochemistry 
(using the standard Drosophila protocol) against an HA tag inserted at the N ter- 
minus of the protein, using final instar larval wing discs with mouse anti-HA and 
goat anti-mouse alexa-fluor 568 secondary antibodies (Abcam), imaged by Leica 
SP5 confocal (Extended Data Fig. 6e). 
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Extended Data Figure 1 | H. melpomene race-associated cortex splicing 
variation. a, Exons and splice variants of cortex in H. melpomene. 
Orientation is reversed with respect to Figs 2 and 4, with transcription 
going from left to right. SNPs showing the strongest associations with 
phenotype are shown with stars. b, Differential expression of two regions 
of cortex between whole hindwings of H. melpomene amaryllis and 

H. melpomene aglaope (n= 11 and n = 10, respectively). Box plots 

are standard (median; seventy-fifth and twenty-fifth percentiles; 
maximum and minimum excluding outliers (shown as discrete points)). 


* P< 0.0001, *P < 0.05, Wilcoxon rank sum test. c, Expression of a 
cortex isoform lacking exon 3 is found in H. melpomene aglaope but not 
H. melpomene amaryllis hindwings. d, Expression of an isoform lacking 
exon 5 is found in H. melpomene rosina but not H. melpomene melpomene 
hindwings. Green triangles indicate predicted start codons and red 
triangles predicted stop codons, with usage dependent on which exons 
are present in the isoform. Schematics of the targeted exons are shown 
for each (q)RT-PCR product; black triangles indicate the positions of the 
primers used in the assay. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


4 500 1,000 1,500 2,000 2,363 

Consensus EE ELE Ln Aa 

ently ica MTC TT Aa aT 

1. Yb_Walk_Frag_A a, 
Exon 4- <= SNP457056. SNP457083 


m_melpomene 
. Hm_melpomene | 9346 an ez 


. Hm_cythera_20- she | Yb_A 
. Hm_rosina_T_Yb_A 

. Hm_rosina_2071_Yb_A 
Hm_amaryllis_| ak Yb_A 
.Hm eae ee 09- ae Ay 
Hm_amaryllis_09-333_Y 
Hm_amandus_ 23-2228 YEA 
. Ht_thelxinoe_8624 Yb A 

. Ht_thelxinoe_8631_Yb_A 


OONAARWN=> Oo: 


1 250 500 750 1,000 1,250 1,500 1,750 2,000 2,27: 

Consensus —-— elie 

Identity TT 2 T_T 

FD 41. Yb_Walk_Frag_B = 
SNP584418 SNP 584633 xon i 


LTH HIF L__] 
CI LHI __] 
i a | | | 
Wh Cd 2 Say = 
Wn Cd 2 SE 
a | a 

a eee |) 1) ee ee ee 
SF 


0. Hm_melpomene | 

. Hm_melpomene_16- 18097 Yb_ B 
. Hm_cythera_20-2856_ Yb B1 

. Hm_cythera_20-2856_Yb_B2 

. Hm_rosina_1 Yb B 

. Hm_rosina_2071_Yb B 

. Hm_amaryllis_09- Paar Yb_B 

. Hm_amaryllis_09-333_Yb_B 

. Hm_amandus_ 22-2221 YE B 

. Hm_amandus_ 23-2228 Yb B 
FIC! 20. Ht_thelxinoe_8631_Yb_B1 
FIC’ 24. Ht_thelxinoe_8631_Yb_B2 
FID 22. Ht_thelxinoe_8624 Yb_B 


1 500 1 ,000 1 500 2,000 2,500 2,914 
Consensus sce IAD TLE intincanen ohana ahh tah lessens ein 
entity 1A AAT TT ace ae llemetee 


1. Yb_Walk_Frag_C ——— a 


Hm_melpomene_ 16-18 Yb 
. Hm_melpomene o316. Yb_C | 
. Hm_cythera_20-2856_Yb_C 
.Hm_rosina_T Yb_C 

. Hm_rosina Oy c 
Hm_amaryllis_09-332_Yb_C 
. Hm eaniarllicd 09-333 Yb C 
. Hm_amandus_ 23-2228 Yb_C 
. Ht_thelxinoe_ 8631 Yb C1 ~ 

. Ht_thelxinoe_ 8631 Yb C2 

. Ht_thelxinoe 8624 Yb _C 


Extended Data Figure 2 | Alignments of de novo assembled fragments although some near-perfect associations are seen in fragment C. Black 
containing the top associated SNPs from H. melpomene and related taxa _ regions, missing data; yellow boxes, individuals with a yellow hindwing 
short-read data. Identified indels do not show stronger associations with bar; blue boxes, individuals with a yellow forewing band. 

phenotype that those seen at SNPs (as shown in Extended Data Table 2), 
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Extended Data Figure 3 | Sequencing of long-range PCR products and 
fosmids spanning cortex. a, Sequence read coverage from long-range 
PCR products across the cortex coding region from two H. melpomene 
races. b, Minor allele frequency difference from these reads between 

H. melpomene aglaope and H. melpomene amaryllis. Exons of cortex are 
indicated by boxes, numbered as in Extended Data Fig. 2. c, Alignments of 
sequenced fosmids overlapping cortex from three H. melpomene (H. m.) 
individuals of difference races. No major rearrangements are observed, 
nor any major differences in transposable element (TE) content between 
closely related races with different colour patterns (melpomene/rosina 

or amaryllis/aglaope). H. melpomene amaryllis and rosina have the 

same phenotype, but do not share any transposable elements that are 


Sasa Se 


| 
mm Hm_ama_8P23+13K4 
Laas 
SSS 


Hm_agl_96A7 


Ss) ENR Hr 2o|_19715 
|_| | i} a 


not present in the other races. Hm_BAC, BAC reference sequence; 
Hm_mel, melpomene from new unpublished assembly of H. melpomene 
genome”!; Hm_ros, rosina (two different alleles were sequenced from 
this individual); Hm_ama, amaryllis (two non-overlapping clones were 
sequenced from this individual); Hm_agla, aglaope (four clones were 
sequenced from this individual, of which two represent alternative 
alleles). Alignments were performed with Mauve;coloured bars represent 
homologous genomic regions. cortex is annotated in black above each 
clone. Variable transposable elements are shown as coloured bars 
below each clone: red, Metulj-like non-LTR; yellow, Helitron-like DNA; 
grey, other. 
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Extended Data Figure 4 | Expression array results for additional stages. array (b, e, g, i, 1, n) for day 1 (a, b, h, i), day 5 (d, e, k, 1) and day 7 
Array results are related to Fig. 4. a-g, Comparisons between races (f, g, m, n) after pupation. The level of expression difference (log fold 
(H. melpomene plesseni and H. melpomene malleti) for three wing regions. change) for tiling probes showing significant differences (P < 0.05) is 
h-n, Comparisons between proximal and distal forewing regions for each shown for day 1 (c and j) with probes in known cortex exons shown in 
race. Significance values (—logioP) are shown separately for genes in the dark colours and probes elsewhere shown as pale colours. P values are 


HmYb region from the gene array (a, d, f, h, k, m) and for the HmYb tiling based on FDR-adjusted t-statistics. 
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Extended Data Figure 5 | Alternative splicing of cortex. a, Amplification 
of the whole cortex coding region, showing the diversity of isoforms and 
variation between individuals. b, Differences in splicing of exon 3 between 
H. melpomene aglaope and H. melpomene amaryllis. Products amplified with 
a primer spanning the exon 2-4 junction at three developmental stages. The 
lower panel shows verification of this assay by amplification between exons 
2 and 4 for the same final instar larval samples (replicated in Extended Data 
Fig. 2c). c, Lack of consistent differences between H. melpomene melpomene 
and H. melpomene rosina in splicing of exon 3. Top panel shows products 
amplified with a primer spanning the exon 2-4 junction; lower panel shows 
the same samples amplified between exons 2 and 4. d, Differences in splicing 
of exon 5 between H. melpomene melpomene and H. melpomene rosina. 
Products amplified with a primer spanning the exon 4—6 junction at three 
developmental stages. e, Subset of samples from d amplified with 
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primers between exons 4 and 6 for verification (middle, 24-h pupae samples 
are replicated in Extended Data Fig. 2d). f, Lack of consistent differences 
between H. melpomene aglaope and H. melpomene amaryllis in splicing of 
exon 5. Products amplified with a primer spanning the exon 4-6 junction. 

g, H. melpomene cythera also expresses the isoform lacking exon 5, while 

a pool of six H. melpomene malleti individuals do not. h, Expression of the 
isoform lacking exon 5 from an F, H. melpomene melpomene x H. melpomene 
rosina cross. Individuals homozygous or heterozygous for the H. melpomene 
rosina HmYb allele express the isoform while those homozygous for the 

H. melpomene melpomene HmYb allele do not. i, Allele-specific expression of 
isoforms with and without exon 5. Heterozygous individuals (indicated with 
blue and red stars) express only the H. melpomene rosina allele in the isoform 
lacking exon 5 (G at highlighted position), while they express both alleles in 
the isoform containing exon 5 (G/A at this position). 
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Extended Data Figure 6 | Phylogeny of fizzy family proteins and effects 
of expressing cortex in the Drosophila wing. a, Neighbour joining 


LETTER 


phylogeny of fizzy family proteins including functionally characterized 
proteins (in bold) from Saccharomyces cerevisiae, Homo sapiens and 

D. melanogaster as well as copies from the basal metazoan Trichoplax 
adhaerens and a range of annotated arthropod genomes (Daphnia 
pulex, Acyrthosiphon pisum, Pediculus humanus, Apis mellifera, Nasonia 
vitripennis, Anopheles gambiae and Tribolium castaneum) including the 
lepidoptera H. melpomene (in blue), D. plexippus and B. mori. Branch 


colours: dark blue, cdce20/fzy; light blue, rap; red, lepidopteran cortex. 
b-e, Ectopic expression of cortex in D. melanogaster. Drosophila cortex 
produces an irregular microchaete phenotype when expressed in the 
posterior compartment of the fly wing (c) whereas Heliconius cortex does 
not (d), when compared to no expression (b). A, anterior; P, posterior. 
Successful Heliconius cortex expression was confirmed by anti-HA 


immunohistochemistry in the last instar Drosophila larva wing imaginal 
disc (e, red), with DAPI staining in blue. 
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Extended Data Table 1 | Genes in the Yb region and evidence for wing patterning control in Heliconius 


Heliconius melpomene H. erato Hn 
Hmgene ID He gene ID Putative gene name Yb) Sblo AY AN Et Ew gs Ew ET cr A Ae Cp! ae 
HM00002 HERA000036 Acylpeptide hydrolase 2 x 
HMo00003 HERA000037 HM00003 x 
HM00004 HERA000038 Trehalase-1B x x 
HMo00006 HERA000038.1 +=‘ Trehalase-1A x x 
HM00007 HERA000039 B9 protein x x 
HM00008 HERAO000040 HM00008 x 2 x 
HMO00010 HERA000041 WD40 repeat domain 85 x x 
HMo00012 HERA000042 CG2519 x x x 
HM00013 HERAO000045 Unkempt x 4 
HMo00014 HERA000046 Histone H3 x x 
HM00015 HERA000047 HMo00015 x x 
HM00016 HERA000048 HM00016 x x 
HM00017 HERA000049 RecQ Helicase x x 
HMO00018 HERA000051 HM00018 x x 
HM00019 HERA000052 BmSuc2 x x x 
HM00020 HERA000053 CG5796 x x 
HMo0021 HERAO000054 HMmo0021 x x 
HM00022 HERAO000055 Enoyl-CoA hydratase x x 
HM00023 HERAO00056 ATP binding protein x x 
HM00024 HERAO000057 HM00024 x x 
HM00025 HERA000059 cortex x Xx 56 74 x x x 603 1796 x 2 99 x 51 
HM00026 HERA000077 Poly(A)-specific ribonuclease (parn) % 10 1 34 x x 
HM00027 HERA000079 CG31320 x x x 
HM00028 HERAO00080 ARP-like x x x 
HM00029 HERA000081 CG4692 x x x 
HM00030  HERAQQO082 —(TPteakome 268 non ATPase x x x 
HM00031 HERA000083 HM00031 x x x x 
HM00032 HERA000084 Zinc phosphodiesterase x 1 x x 
HM00033 ~—- HERAOO0085 hi eae Kinase, x 8 x x 
HM00034 HERA000086 WD repeat domain 13 (Wdr13) 1 4 5 x x 
HM00035 HERA000087 Domeless 1 2 x x 
HM00036 HERAO00061 WAS protein family homologue 1 5 36 37 x x 
HM00038 HERA000062 Lethal (2) k05819 CG3054 x 2 x 
HM00039 HERAQOO064 eee protein kinase ‘ i 
HM00040 HERA000064.1. +DNA excision repair protein ERCC-6 x x 
HMo00041 HERAO000065 Penguin x x 
HM00042 HERAOO0066 Thymidylate kinase x x 
HM00043 HERAO00067 Caspase-activated DNase x x 
HM00044 HERA000068 Regulator of ribosome biosynthesis x x 
HM00045 HERA000069 CG12659 x x 
HM00046 HERAO000070 CG33505 x x 
HM00047 HERA000071 Sr protein x x 
HM00048 HERA000073 HM00048 x x 
HM00049 HERA000073.1 ~=HM00049 x x 
HM00050 HERA000074 Shuttle craft x x 
HM00051 HERA000075 HM00051 x x 
HM00052 HERAO000076 HM00052 x x x 


Ac, number of above background SNPs associated with the H. numata (Hn) bicoloratus phenotype in this study. A®”, number of SNPs fixed for the alternative allele in H. erato favorinus. AN, number 

of above background SNPs associated with the forewing yellow band in this study. A°‘, number of SNPs fixed for the alternative allele in H. erato demophoon. A‘, number of above background SNPs 
associated with the hindwing yellow bar in this study. Cr', within the previously mapped HeCr interval''. P!, within the previously mapped P interval’. E!, detected as differentially expressed between 

H. melpomene aglaope and amaryllis from RNA-seq data in this study (Supplementary Information). E®', detected as differentially expressed between H. melpomene plesseni and malleti in in the gene 
array in this study. E8”, detected as differentially expressed between forewing regions in the gene array in this study. E“, numbers of probes showing differential expression between H. melpomene 
plesseni and malleti in in the tiling array in this study. E‘, numbers of probes showing differential expression between forewing regions in the tilling array in this study. Sb!, within the previously mapped 
Sb interval’. Yb!, within the previously mapped Yb interval!. Sb controls a white-yellow hindwing margin and is not investigated in this study. The N locus has not been fine-mapped previously. 
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Extended Data Table 2 | Locations of fixed or above-background SNPs and differentially expressed (DE) tiling array probes 


Positions of SNPs 


in the He and Hn cortex Other 
association cortex cortex cortex flanking genes 
analyses coding UTR introns intergenic (exons or Other 
exons exons (nonTE) (nonTE) TEs introns) intergenic Total 
erato favorinus fixed 2 0 96 8 2 0 0 108 
erato demophoon 
fixed 0 0 1 5 1 2 6 15 
numata bicoloratus 
above background 1 3 47 16 0 2 0 69 
Positions of DE tiling array 
probes Known 
cortex cortex cortex Other Other 
coding UTR introns gene introns/ 
exons exons (nonTE) miRNAs TEs exons intergenic Total 
s Forewing 
2 proximal 8 7 323 0 13 1 7 359 
2 | 
g Forewing 
— distal 12 2 327 0 8 0 8 357 
2 
s = 
a) Hindwing 5 14 378 0 9 1 6 413 
n 
ia 
= 0) malleti 0 1 68 0 0 0 12 81 
=o 
8 x) 
o 
plesseni 2 4 222 0 10 0 4 242 
5 Forewing 
2 proximal 1 0 22 0 3 0 7 33 
rf 
2 Forewing 
5 distal 2 3 116 1 9 5 112 248 
e & 
a Hindwing 9 10 500 1 20 2 80 622 
no 
> 
= g malleti 0 12 95 0 1 0 0 108 
F 2 
ao : 
plesseni 3 3 81 0 99 0 0 186 
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Extended Data Table 3 | SNPs showing the strongest phenotypic associations in the H. melpomene/timareta/silvaniform comparison 


SNP pos SNPpos SNPpos_ SNP pos SNP pos SNPpos SNPpos_ SNP pos 
Sample HW457083t 439063* 602131¢ 457056 FW 584465§ 584418§  584633§ 603344} 


Species Rass Code _ bar (p=6.07E- (p=1.72E- (p=2.42E- (p=2.42E- band (p=1.37E- (p=1.41E- (p=2.10E- (p=2.19E- 
10) 09 ) 09) 09) 07) 07) 07) 07) 
H.melpomene aglaope 09-246 G NA 


H.melpomene aglaope 09-267 
H.melpomene aglaope 09-268 
H.melpomene aglaope 09-357 


H.melpomene aglaope aglaope.1 


H.melpomene amandus 2221 1 AIA NA G/G C/C 0 cic T/T T/T AIA 
H.melpomene amandus 2228 1 AA NA G/G C/C 0 C/T TIA T/C AIA 
H.melpomene  amaryllis 09-332 ; ie AIA G/G eal 0 cic TIT T/T AIA 
H.melpomene amaryllis 09-333 1° T/T AIA G/G T/T 0 cic T/T WAT AIA 
H.melpomene amaryllis 09-075 1 = T/T AIA G/G T/T 0 Cc/C WAL WAN AIA 
H.melpomene amaryllis 09-079 1 T/T AJA G/G T/T 0 Cc/C WAL TT AIA 
H.melpomene  amaryllis amaryllis.11 T/T AIA G/G TT 0 Cc/C nt T/T AIA 
H.melpomene _ bellula 228 nl Way NA G/G T/T 0 Cc/C T/T T/T NA 
H.melpomene _ bellula 231 Hl Way NA G/A T/T 0 C/T TIA T/C NA 
H.melpomene — cythera 2856 1. T/T AJA G/G T/T 0 cic T/T TT AIA 
H.melpomene — cythera 2857 1 0 NA NA NA 


H.melpomene  malleti 17162 
H.melpomene melpomene18038 
H.melpomene melpomene18097 
H.melpomene melpomenem0.06 


H.melpomene melpomenegen_ref 
H.melpomene melpomene13435 
H.melpomene melpomene9315 
H.melpomene melpomene9316 
H.melpomene melpomene9317 
H.melpomene_ plesseni 9156 
H.melpomene_ plesseni 16293 


H.melpomene _ rosina rosina.1 
H.melpomene _ rosina 2071 
H.melpomene _ rosina 531 
H.melpomene _ rosina 533 
H.melpomene _ rosina 546 


H.melpomene _ thelxiopeia 13566 
H.melpomene vulcanus 14632 
H.melpomene vulcanus 519 


H. timareta florencia 2403 
H. timareta florencia 2406 
H. timareta florencia 2407 
H. timareta florencia 2410 
H. timareta timareta 8533 
H. timareta timareta 9184 
H. timareta timareta 8520 
H. timareta timareta 8523 
H. timareta thelxinoe 09-312 i en AJA G/G T/T 0 Cc/C T/T 
H. timareta thelxinoe 8624 i AR AIA G/G T/T 0 C/C T/T 
H. timareta thelxinoe 8628 1. AT AIA G/G T/T 0 Cc/C TIT 
H. timareta thelxinoe 8631 4 0 Cc/C T/T 
H. elevatus 09-343 a 1 on NA 
H.pardalinus _sergestus 09-326 A NA Oo. CIC T/T 


*Downstream of cortex. Between exons 3 and 4 of cortex. {Upstream of cortex. Between exons U4 and U3 of cortex. None of these SNPs are within known transposable elements. Colours show 
phenotypic associations: yellow, yellow hindwing bar; pink, no yellow hindwing bar; green, yellow forewing band; blue, no yellow forewing band; grey, allele does not match expected pattern. 


© 2016 Macmillan Publishers Limited. All rights reserved 


Extended Data Table 4 | Transposable elements (TEs) found within the Yb region 
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Unique Occurrences 


BAC __ mel ros ama agl No. TE name Superfamil Type 
1 1 BEL-1 BEL LTR retrotransposon 
4 CR1-2 Jockey LINE Non-LTR retrotransposon 
1 1 Daphne-1 Jockey LINE Non-LTR retrotransposon 
1 1 Daphne-6 Jockey LINE Non-LTR retrotransposon 
1 1 DNA-like-8 DNA transposon 
1 Helitron-like-14 Helitron_A DNA transposon 
1 2 4 Helitron-like-12 Helitron_A DNA transposon 
1 2 5 Helitron-like-12b Helitron_A DNA transposon 
1 1 1 4 7 Helitron-like-4a Helitron_A DNA transposon 
Helitron-like-4b Helitron_A DNA transposon 
Helitron-N2 Helitron_A DNA transposon 
3 Helitron-like-7 Helitron_A DNA transposon 
5 3 3 1 2 16 Helitron-like-6a Helitron_B DNA transposon 
Helitron-like-6b Helitron_B DNA transposon 
Helitron-like-11 Helitron_B DNA transposon 
2 2 1 1 11 Helitron-like-15 Helitron_B DNA transposon 
6 5 3 1 18 Helitron-like-5 Helitron_B DNA transposon 
1 2 Hmel_Unknown_50 
1 1 2 Hmel_Unknown_174a/b 
1 1 Hmel_Unknown_187b 
1 1 2 Hmel_Unknown_230 
1 Hmel_Unknown_234a 
1 Hmel_Unknown_236a 
1 1 Jockey-4 Jockey LINE Non-LTR retrotransposon 
1 1 LTR-3_gypsy Gypsy LTR retrotransposon 
1 1 Mariner-4 Mariner/Tc1 DNA transposon 
1 3 29 Metulj-0 Metulj SINE Non-LTR retrotransposon 
Metulj-1 Metulj SINE Non-LTR retrotransposon 
Metulj-2 Metulj SINE Non-LTR retrotransposon 
Metulj-3 Metulj SINE Non-LTR retrotransposon 
Metulj-4 Metulj SINE Non-LTR retrotransposon 
Metulj-5 Metulj SINE Non-LTR retrotransposon 
Metulj-6 Metulj SINE Non-LTR retrotransposon 
Metulj-7 Metulj SINE Non-LTR retrotransposon 
nTc3-4 Mariner/Tc1 DNA transposon 
SINE-1 SINE SINE Non-LTR retrotransposon 
1 1 2 nMar-3 Mariner/Tc1 DNA transposon 
1 1 nMar-16 Mariner/Tc1 DNA transposon 
1 1 nMar-12/20 Mariner/Tc1 DNA transposon 
1 1 nPIF-3 PIF/Harbinger DNA transposon 
1 1 nTc3-2 Mariner/Tc1 DNA transposon 
1 2 nTc3-3 Mariner/Tc1 DNA transposon 
1 2 R4-1 R2 LINE Non-LTR retrotransposon 
1 1 6 Rep-1 REP LINE Non-LTR retrotransposon 
2 1 1 4 RTE-3 RTE LINE Non-LTR retrotransposon 
1 2 RTE-11 RTE LINE Non-LTR retrotransposon 
1 3 Zenon-1 Jockey LINE Non-LTR retrotransposon 
1 1 Zenon-3 Jockey LINE Non-LTR retrotransposon 
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Early Neanderthal constructions deep in Bruniquel 
Cave in southwestern France 


Jacques Jaubert'*, Sophie Verheyden?**, Dominique Genty**, Michel Soulier®, Hai Cheng®’, Dominique Blamart’, 
Christian Burlet*, Hubert Camus®, Serge Delaby’, Damien Deldicque'’, R. Lawrence Edwards’, Catherine Ferrier’, 
Francois Lacrampe-Cuyaubére!’, Francois Lévéquel’, Frédéric Maksud", Pascal Mora!®, Xavier Muth”, Edouard Régnier*, 


Jean-Noél Rouzaud!° & Frédéric Santos! 


Very little is known about Neanderthal cultures’, particularly early 
ones. Other than lithic implements and exceptional bone tools”, 
very few artefacts have been preserved. While those that do remain 
include red and black pigments’ and burial sites*, these indications 
of modernity are extremely sparse and few have been precisely 
dated, thus greatly limiting our knowledge of these predecessors 
of modern humans’. Here we report the dating of annular 
constructions made of broken stalagmites found deep in Bruniquel 
Cave in southwest France. The regular geometry of the stalagmite 
circles, the arrangement of broken stalagmites and several traces of 
fire demonstrate the anthropogenic origin of these constructions. 
Uranium-series dating of stalagmite regrowths on the structures 
and on burnt bone, combined with the dating of stalagmite tips in 
the structures, give a reliable and replicated age of 176.5 thousand 
years (2.1 thousand years), making these edifices among the oldest 
known well-dated constructions made by humans. Their presence at 
336 metres from the entrance of the cave indicates that humans from 
this period had already mastered the underground environment, 
which can be considered a major step in human modernity. 

Since its natural closing during the Pleistocene period and until 
its discovery® in 1990, no humans entered Bruniquel Cave, located 
in southwest France (44° 4’ N, 1° 41’ E, Extended Data Fig. 1a), an 
area already rich in Palaeolithic sites (Extended Data Fig. 1b). Local 
cavers then dug through the collapsed entrance, a 30-m long and nar- 
row passage through which persons can reach the main gallery. The 
structures (Fig. 1 and Extended Data Fig. 2a) are located at 336 m 
from the entrance after an easy walk through speleothem-rich cham- 
bers (Extended Data Fig. 1c). Near the entrance, the remains of large 
Pleistocene fauna and Holocene micro-fauna were found’, and bears 
also left numerous traces of their presence: hibernation hollows, claw 
marks and a few footprints. The most notable features, however, are 
the strange arrangement of two annular structures made of whole and 
broken stalagmites (Fig. 1 and Supplementary Video 1), accompanied 
by numerous traces of fire (Fig. 1 and Extended Data Fig. 3). Other 
than these structures, signs of human activity are almost non-existent 
and uncertain: a stalagmite tip that seems to have been hollowed out, 
negative prints left by wrenching stalagmites from the ground, and a 
few speleothem pieces in locations other than their original ones. At 
present, no marks on the cave walls or footprints have been observed. 
A first study in the early 1990s provided a detailed plan of the struc- 
tures and a single '4C accelerator mass spectrometry dating of a burnt 
bone found in the main structure, giving an intriguing age of >47.6 
thousand years ago (ka; ref. 6). 


The question was whether these unique constructions were made by 
Neanderthals.*” Unfortunately, the premature death of the archaeolo- 
gist F. Rouzaud, along with the restricted access to the cave, prevented 
any further research until 2013 when we decided to date and study 
these enigmatic constructions. 

The arranged structures composed of whole and broken stalagmites, 
here designated as ‘speleofacts’ (Extended Data Table 1), are located 
in the largest chamber of the cave (Extended Data Fig. 1c). Our study 
defines two categories of structures: two annular ones, which are the 
most impressive, and four smaller stalagmite accumulation structures 
(Supplementary Video 1). The largest annular structure is 6.7 x 4.5m, 
and the smaller one is 2.2 x 2.1m. The accumulation structures consist 
of stacks of stalagmites and are from 0.55 m to 2.60 m in diameter. Two 
of them are located in the centre of the larger annular construction, 
while the other two are outside of it (Fig. 1). Overall, about 400 pieces 
were used, comprising a total length of 112.4 m and an average weight 
of 2.2 tons of calcite (Extended Data Table 1). Half of the elements 
composing the structures consist of the middle part of stalagmites 
(that is, without the root or tip), and very few pieces are whole (~5%). 
The stalagmites are well calibrated with a mean length of 34.4 cm for 
the large (A) and 29.5 cm for the small (B) annular structures, thus 
strongly suggesting intentional construction (Extended Data Fig. 4). 
Marks left by stalagmite wrenching are seen near the structures, 
though in most cases the original provenance of the stalagmites is 
difficult to determine owing to calcite flowstones covering a large part 
of the cave floor. 

The annular structures are composed of one to four superposed 
layers of aligned stalagmites (Extended Data Fig. 2b). Notably, some 
short elements were placed inside the superposed layers to support 
them (Extended Data Fig. 2d, e). Other stalagmites were placed 
vertically against the main structure in the manner of stays, perhaps 
to reinforce the constructions (Extended Data Fig. 2a—c). All of these 
elements, combined with the large size of the structures, exclude any 
interventions by bears (Supplementary Information Table 2). Although 
bear traces are present throughout the cave (fur, claw marks, paw 
prints), hibernation hollows are observed only in other sectors (End 
Gallery, Bear Hollow Chamber at ~80m and 240m from the Structure 
Chamber). 

Traces of fire are present on all six structures (Fig. 1). They consist 
of 57 reddened, more or less fissured speleofacts, and 66 blackened 
ones (Extended Data Fig. 3). The red and black colours are clearly 
not related to precipitates from the dripping water since no similar 
traces are observed on the ceiling. Instead, most of the coloured (and 


1PACEA, UMR 5199 CNRS-UB-MCC University of Bordeaux, 33615 Pessac, France. Earth & History of Life, Royal Belgian Institute of Natural Sciences, 1000 Brussels, Belgium. 3AMGC, Vrije 
Universiteit Brussel, 1050 Brussels, Belgium. “LSCE, UMR 8212 CNRS-CEA-UVSQ, 91400 Gif-sur-Yvette, France. Société spéléologique et archéologique de Caussade, 5 rue Bourdelle 82300 
Caussade, France. “Institute of Global Environmental Change, Xi’an Jiaotong University, Xi’an 710049, China. Earth Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA. 
8Protée Expert Sas, 30250 Sommiéres, France. °Faculté Polytechnique, University of Mons, 7000-Mons, Belgium. !°Laboratoire de Géologie de I'Ecole Normale Supérieure de Paris (ENS), UMR 
CNRS 8538, 75000 Paris, France. !!Archéosphére, 11500 Quirbajou, France. !?Get in Situ, 1091 Bourg-en-Lavaux, Switzerland. SLIENSs, UMR 7266 CNRS-University of La Rochelle, 17000 La 
Rochelle, France. !4Ministry of Culture, Regional Archaeological Service of Midi-Pyrénées, 31080 Toulouse, France. !5Archéostransfert, Archéovision, UMS 3657 SHS-3D, 33007 Pessac, France. 


*These authors contributed equally to this work. 


2 JUNE 2016 | VOL 534 | NATURE | 111 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


oO o 
Structure B 
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Figure 1 | Ortho-image of the Bruniquel Cave structures. The six 
structures are composed only of speleothems or fragments of speleothems 
(speleofacts), aligned and superimposed (A, B) (Extended Data Fig. 2a, b), 
or accumulated (C, D, E, F). A’ is a likely extension of A. Their contours 
are sometimes imprecise due to the calcite layer and stalagmitic regrowths 
that cover them. The orange spots represent the heated zones, all located 
on the construction elements. The red spot (structure B) represents a char 
concentration (mainly burnt bone fragments) on the ground (Extended 
Data Fig. 3, bottom left). 


fissured) locations were clearly heated, as confirmed by magnetic 
measurements of the most visibly reddened and blackened zones 
(Extended Data Fig. 5). A char (that is, carbonized organic material) is 
located near structure B, and a dozen black fragments are observed in 
the structures. The largest one is a 6.7-cm-long burnt bone (diaphysis) 
ofa bear or large herbivore found on accumulation structure E (Fig. 1). 
It was covered by a 6-mm-thick calcite layer that has been precisely 
dated (Extended Data Table 2 and Extended Data Fig. 6a, d, e). The 
calcite surrounding this bone is reddened, blackened and fissured. 
Another black fragment was trapped between the calcite regrowth and 
the structure (Extended Data Fig. 6b). The black fragments and bone 
were clearly heated, as indicated by molecular and atomic spectrom- 
etry (Extended Data Fig. 6). 

The age of the constructions has been determined by uranium-series 
dating of the stalagmite calcite (Supplementary Information SM2): the 
top of stalagmites that are part of the structure give maximum ages 
while the bases of the stalagmite regrowths sealing the structures give 
minimum ages (Supplementary Information SM1). 

Eighteen multi-collector ICP-MS uranium-series ages’? with 20 
uncertainties were obtained from the calcite cores extracted from 
the stalagmites (Extended Data Table 2, Supplementary Information 
SM2). 

Four additional samples were also dated: one from a core taken 
in the flowstone pavement inside annular structure A to evaluate its 
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contemporaneity with the structures, and three from the calcite layer 
that formed on the burnt bone found inside accumulation structure 
E (Fig. 1). 

From the five calcite regrowths covering the structure, the two oldest 
ages are situated in the same time window, that is, 177.9 + 3.7 ka and 
175.2 + 0.8 ka (Fig. 2 and Extended Data Table 2). They partially cover 
the age of the youngest dated stalagmite in the structure (177.1 + 1.5 ka, 
Extended Data Table 2). All other ages correspond with this chronol- 
ogy, showing that the stalagmite tips are contemporary to or older 
than the calcite regrowths. 

These results indicate that the structure was built between 
175.2 +0.8ka and 177.1 + 1.5ka (Fig. 3). Moreover, additional evi- 
dence for human presence in the cave at this time (Extended Data 
Table 2; Extended Data Figs 5 and 6) is provided by the burnt bone 
located in structure E, older than 180.9 + 20.3 ka, the age of the calcite 
that formed on its surface, and the bone fragment trapped inside the 
BR-stm-SB7 core, with a minimum age of 175.2 £0.8ka. 

The age (175.9 +5.7 ka) of the calcite flowstone situated inside the 
annular structure is similar to that of the main structure within the 
margin of error (176.5 +2.1ka), suggesting that the climate during this 
period (that is, 175-177 ka), covering part of marine isotope stage 6, 
was sufficiently humid and warm to allow continuous calcite deposi- 
tion despite generally glacial conditions (Fig. 3). It can be associated 
with the warm phase VI-6-5 of the nearby Villars Cave speleothem 
record, characterized by low 880 and §13C (ref. 11) (Extended Data 
Fig. 7). Other European records show a similar climatic pattern, such 
as the high percentage of Euro-Siberian pollen in the MD01-2444 
marine core off Lisbon between ~175 and 177 ka (ref. 11). 

Early Neanderthals were the only human population living in 
Europe during this period'?. Our findings suggest that their society 
included elements of modernity, which can now be proven to have 
emerged earlier than previously thought. These include complex spa- 
tial organization, fire use, and deep karst occupation (Extended Data 
Fig. 8b). 

Solid evidence for spatial organization (that is, human construc- 
tions, especially complex ones that required a social organization) 
during the Lower or Middle Palaeolithic is rare'*. One hypothesis for 
its emergence postulates a sudden appearance of social organization 
with the arrival of modern humans (Homo s. sapiens)!*, while a sec- 
ond hypothesis claims a more gradual and mosaic emergence during 
Neanderthal times in different parts of the world, including Europe”. 
In Europe, however, completely preserved sites are exceptional before 
the Upper Palaeolithic (42,000 calibrated years before present) !® and 
taphonomic processes hinder their identification!”!*. The spatial 
organization at Bruniquel Cave is the first one attributed with cer- 
titude to the early Middle Palaeolithic. The use of stalagmites is also 
unique for periods older than the Upper Palaeolithic, and implies a 
necessary simultaneous realization of different tasks and consequently, 
the existence of some degree of social organization (Extended Data 
Fig. 8a). The location of the Bruniquel structures inside a cave, where 
they were protected from weathering, animals and humans, played a 
major role in their preservation. 

The first unequivocal use of fire is dated to the Middle Pleistocene 
(approximately 0.8 million years ago (Ma))!? and more than 1 Ma in 
southern Africa”°, with a more generalized use only after 0.3 Ma. A 
critical review of all known remains of fire in Europe”! concluded that 
Neanderthals were the first to commonly use fire, and in particular at 
the end of the Middle Pleistocene when they began to cook and pro- 
duce new materials such as organic glue and haft tools. During marine 
isotope stage 6, the average number of proven fire uses for 10,000-year 
time slices is 1.47, which is very low”°. None of these sites is associated 
with a deep karst context. 

Deep karst occupation does not appear to have occurred in Africa 
in any period, whether the Early or Middle Stone Age, or even the Late 
Stone Age if we exclude shelters and cave entrances with evidence for 
human presence in South Africa, Ethiopia and Maghreb (Extended 
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BR-stm-RB7 


BR-stm-SB7 
BR-stm-SA249 


BR-stm-SA59 


Figure 2 | The calcite cores sampled from the structures. BR-stm-SA59, 
BR-stm-SA249 and BR-stm-SB7 were cored from the tips of stalagmites 
used to build the structures. BR-stm-RB7, BR-stm-RA62 and 
BR-stm-RB23 were sampled at the base of stalagmites growing on the 
structures. All three cores display regrowth in their upper part as well as 
the older underlying stalagmite used as building item. Core BR-PL-P13 
was taken from the flowstone located inside the main structure A. Samples 
were taken with a 1.6 cm (for BR-stm-SA59) and 2.6 cm diameter (for 

the other cores) coring device. Subsamples for uranium-series dating are 
indicated with their number (white). The dashed line indicates, within the 


Data Fig. 8). The oldest evidence for the appropriation of this dif- 
ficult environment is found in Europe’, Southeast Asia/Sunda??, 
Wallacea*4 and Australia/Sahul**. The accumulation of human bodies 


-@— -—=o— 
ae 
a a a 
HH 


BR-stm-RA62 


BR-stm-RB23 


BR-pI-P13 


deposition of calcite, the moment of the building of the structures, that is, 
the limit between the stalagmites used in the structure (speleofact) and the 
regrowths. In most cases, this limit is marked by a clay layer. The ages for 
samples taken under the dashed line are given below the cores (orange); 
the ages for samples taken above the line (yellow) are given above the 
cores. The ages given in red are those which give the closest maximum age 
for the structures. ky, thousand years. The age of the flowstone inside the 
structure A is given in white, since the position corresponding to the time 
of construction (dashed line) inside the BR-PL-P13 core is still uncertain. 


by Acheuleans at Sima de los Huesos, Spain (0.35 Ma)”® is very dif- 


ferent from the Bruniquel structures, however. In other examples, 
the human frequentation of caves is linked to engraving, painting or 
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Figure 3 | Uranium-series ages (with 20 error bars) obtained from 
the structures. Yellow, ages of the calcite covering the burnt bone in 

the accumulation structure E; red, ages obtained from the stalagmites 
covering the structure (regrowths) and representing a minimum age for 
the structure; blue, ages obtained from the stalagmites used by humans 
to build the structure (speleofacts) and representing a maximum age for 
the structures; black, age obtained from the flowstone partially covering 


the inside area of the main structure. The age of the structures is situated 
between 175.2 + 0.8 thousand years (ky) and 177.1 + 1.5 ky. The calcite 
covering the burnt bone is dated to 180.9 + 20.3 ka, indicating a minimum 
age of the bone and adding evidence of earlier human presence in the cave. 
The general climatic context is given by the CO? concentration variation 
(expressed in p.p.m.v., low right y axis) extracted from the Vostok ice core 
record”? (black numbers indicate major marine isotope stages). 
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sculpting activities. These sites are thus younger than 42,000 calibrated 
years before present and are always associated with Homo s. sapiens. 
Symbolic, cultural or funerary activities were the main reasons for 
these cave visits. Until now no evidence has been found for regular 
Neanderthal incursions into caves, except for a possible case of 
footprints”’, and Neanderthal constructions inside caves, at least at a 
distance that is no longer exposed to daylight, were totally unknown. 
Moreover, Upper Palaeolithic constructions in caves are limited to fire- 
places, simple hearths, and some rock or speleothem displacements. 
Even in caves regularly visited since the Aurignacian, constructions 
are non-existent or anecdotal*®””. 

What was the function of these structures at such a great distance 
from the cave entrance? Why are most of the fireplaces found on 
the structures rather than directly on the cave floor? Based on most 
Upper Palaeolithic cave incursions, we could assume that they repre- 
sent some kind of symbolic or ritual behaviour’, but could they rather 
have served for an unknown domestic use or simply as a refuge? Future 
research will try to answer these questions. 

The attribution of the Bruniquel constructions to early Neanderthals 
is unprecedented in two ways. First, it reveals the appropriation of a 
deep karst space (including lighting) by a pre-modern human spe- 
cies. Second, it concerns elaborate constructions that have never been 
reported before, made with hundreds of partially calibrated, broken 
stalagmites (speleofacts) that appear to have been deliberately moved 
and placed in their current locations, along with the presence of sev- 
eral intentionally heated zones. Our results therefore suggest that the 
Neanderthal group responsible for these constructions had a level of 
social organization that was more complex than previously thought 
for this hominid species. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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Extended Data Figure 1 | Location and map of Bruniquel Cave. 

a, Bruniquel Cave (marked with a star) is located in the southwest of 
France, south of the calcareous plateaus of Quercy, east of the Aquitaine 
Basin. Its entrance (165 m above sea level) overlooks the Aveyron valley, 
a tributary of the Tarn on the right bank of the Garonne and down 

from the Massif Central (base map courtesy of M. Jarry). b, Bruniquel 
Cave in the Aveyron valley. Orange: Lower Palaeolithic site; red: Middle 
Palaeolithic sites; green: early Upper Palaeolithic; blue: late Upper 
Palaeolithic (Magdalenian). Circles indicate caves, vertical lines indicate 
rock shelters and squares mark open-air sites. *Decorated caves. In 

this area within a 30 km zone around Bruniquel Cave, fifteen major 
Palaeolithic sites are known. The oldest known human occupations in 
this region are those of the Igue des Rameaux (Tarn-et-Garonne), 

a karstic sinkhole where lithic material was associated with a recent 
mid-Pleistocene fauna, dated from marine isotope stages 9 to 5 (ref. 31). 
A Middle Palaeolithic, stratified open-air site is also present at La 
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Rouquette-Puycelsi (Tarn) upstream on the nearby Vére River. The 
other sites are all attributable to the Upper Palaeolithic, representing 

the Aurignacian, Gravettian and Solutrean periods, but mainly the 
Magdalenian period with three decorated caves: Travers de Jannoye, 

La Magdeleine-des-Albis (Penne, Tarn) and Mayriére (Bruniquel, 
Tarn-et-Garonne)*? (base map, courtesy of StepMap GmbH, modified by 
J.J.). ¢, Topography of Bruniquel Cave. The cave consists of a 10-15 m wide 
and 4-7 m high corridor, currently known to be 482 m long. Beyond the 
narrow entrance passage (filled porch), there are no major topographic 
difficulties until the chamber containing the structure at 336 m from the 
unobstructed entrance. Currently, no other access has been identified, 
laterally or at the other end. In this latter case, a second obstructed 
entrance would be at least 295 m from another slope. Sources: Structure 
drawn by M. Soulier and F. Rouzaud, 1992; topography realized by 
Protée-Expert & Get in Situ, 2015; Digital Elevation Model generated 
with 1957 aerial photography IGN, public domain). 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


Structures 
f Speleofacts Total 
A B c D E F 
Total Number 267 49 9 53 15 6 399 
: internal 5.80 1.60 
Maximum length (m) exterial 6.70 220 2.60 1.30 116 0.55 - 
a : ss internal 3.70 1.50 
2 Maximum width (m) éxtarnal 4.50 210 0.90 1.15 0.85 0.50 - 
2 
° 
= 
= . internal 16 5.45 
7) Circumference (m) éxtarial 20.65 7.4 0.60 4.05 3.65 1.60 37.95 
internal 16.3 2.3 
Surface (m?) paras 34 3 0.55 15 1 0.3 29.35 
Traces of fire 12 1 1 2 1 1 18 
Extended Data Figure 2 | Bruniquel Cave structures. a, General view Stalagmites (speleofacts) placed vertically against the main structure 
of the main structure (structure A) with superposed layers of aligned (structure A) in the manner of stays. d, e, Two examples of short back 
stalagmites (speleofacts) Photo courtesy of E. Fabre, SSAC. b, Example stalagmites serving as sustaining pieces. f, Summary of the metric data of 
of speleofacts accumulated over three or even four horizontal levels. c, the structures. 
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Extended Data Figure 3 | Fireplaces and heated areas. a, Examples of a 
fireplace on the main structure. Note the reddened, blackened and fissured 
stalagmites**. The structure in this location (top) is covered by white, more 
recent and still active stalagmites. The heated areas on the speleofacts 
correspond to the red and grey colours, as well as fissuring and superficial 
spalling. These scars are similar to thermal alterations studied in the cave 


Structures 
Burnt remains Total 
A B Cc D E OF 
Heated areas 2,3, 4, 5, 6, 7, 8, 
by structure (#) 9, 10, 11, 12,13 i ee a 
Total 12 1 1 2 1 1 18 
Blnshened 46 24 1 8 66 
elements/soot 
b 


of Chauvet-Pont d’Arc (Ardéche)**. In our current stage of observation, 
the study of their distribution enabled us to identify a well-preserved 
fireplace in structure A, as well as structures that have been disturbed by 
processes that remain to be determined (structures D and E, for example). 
b, Numbers per structure of heated areas, thermic spalling, fissured spots 
and blackened elements (that is, speleofacts) and soot. 
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Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | Statistics of the speleofacts. a, b, Kernel density 
estimates for the dimensions (a, length and b, diameter) of speleofacts 
across the different structures. Structure A can be distinguished from the 
others by the presence of very large speleofacts. Such speleofacts are not 
present in structure D, and only rarely in structure B. Structure C, despite 
its very small size, is worth considering due to the large dimensions of its 
speleofacts. Structures E and F, with only a few speleofacts and no specific 
features, are not represented here. A Kruskal-Wallis test conducted on 
the structures represented here shows a significant difference between 

the median length and median diameter across structures (P < 0.05). 

A post hoc analysis of the diameter with Hochberg’s adjustment method, 
distinguishes structure C from the three others. c, The weight of the 
speleofacts is estimated by the following formula: tD*Lp/12 x (1 + d/D + 
d’/D*) where D is the maximum diameter, d the minimum diameter, L the 
maximal length, and p the calcite density. These weights can be roughly 
estimated by considering them as truncated cones. As their maximum 
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length L, maximum diameter D, and minimum diameter d are known, 
their volume can be easily estimated (Extended Data Table 1). Their 
weight is then obtained by multiplying the previous quantity by the calcite 
density p, which is comprised between 2.5 and 2.8 gcm~? depending on 
its porosity and detrital contamination. Minimal weights are obtained 
using a density of 2.5gcm*. d, The figure shows the mean weights and 
their 95% confidence interval in each structure. e-g, The orientation data 
(Schmidt diagram*’) of the speleofacts in the three main structures (A, B, D) 
are very similar (e, structure A; f, structure B; g, structure D) and do not 
show any preferential direction. The distance to the centre of the circle 
represents the slope; the distribution of the speleofacts is isotropic and 
mostly planar. This confirms in all cases that such orientation and slope 
patterns cannot be due to natural processes related to water flow, mass 
flows or other gravitational processes*’, which in any case would not have 
resulted in the current geomorphology of the cave in this sector. 
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Extended Data Figure 5 | Magnetic survey above the structures. Red 
circles: main recognized hearths. The magnetic survey aims to reveal the 
locations that were heated, including hearths or smaller fireplaces through 
the detection of magnetic anomalies. The first archaeological applications 
of this prospection method are for the location of heated archaeological 
structures (see pages 422-519 of ref. 38). The magnetic properties 
enhancement by heating was first demonstrated for soils*?~*!, and then on 
substrate of caves?“ In this type of hydromorphic environment, iron 

is present as nonmagnetic or weak magnetic FEOOH minerals, such as 
goethite (see pages 375-421 of ref. 38). In these conditions, temperature 
elevation above 200-250°C induces dehydration of the FEOOH, present in 
clay material, to Fe;O4 (magnetite) which is a strong magnetic mineral**. 
The increase of magnetic susceptibility induced by heating offers similar 
information than thermoluminescence methods”. In the present case, 

a magnetic susceptibility increase beyond a factor of two was observed 
after heating a clay sample of the cave. Therefore, the heated clay-like 
material, even if present only in small amounts in speleothems, acquired 
a sufficiently high magnetization to generate a local earth magnetic 
deformation, also called an anomaly. As this deformation decreases 

when the source distance increases (see pages 422-519 of ref. 38), 

a larger anomaly with a medium intensity might reveal a hearth under the 
stalagmitic floor (between structures B and C), calcite being magnetically 
nearly neutral (diamagnetic). The realization of magnetic survey at high 
spatial resolution for detection of paleohearths in prehistoric cave is a 
recent innovation“*. The magnetic field explored above the structures was 
over one metre thick, with a dual sensor G858 Geometrics magnetometer 
with an extended cable. A 360° prism was inserted between both sensors, 
which were superposed at a distance of 0.22 m. These elements were hung 
at the end of a telescopic boom pole and fixed on a tripod. 3D geolocation 
measurements were ensured by tracking with a Trimble S8 total station 
following the 360° prism. This apparatus allows coverage of a volumetric 
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space up to 5m from the operator with ten measurements per second 
while controlling the space covered“. Extended Data Fig. 5 presents the 
results of the magnetic measurements. Altitude contour lines (8.5cm 
distance interval) are extracted from photogrammetric data. The magnetic 
intensity point cloud is a bottom view of the magnetic field intensity 
gradient, that is, the difference in magnetic field intensity as measured 
between the bottom and top sensors. As the local past and present 
magnetic field have an inclination of ~63° down, a magnetic source 
generates a dipolar local deformation of the magnetic field with a negative 
anomaly to the north and positive to the south*®. In Extended Data Fig. 5, 
a dipole corresponds to a blue and red spot aligned approximately north- 
south. The majority of the main dipoles of metric dimension observable 
are mostly associated to fire traces (reddened, blackened calcite) observed 
on the horizontally positioned stalagmites, for example, the heated zone 
of the structures D and E. Increases of magnetic viscosity, known as a fire 
marker”, are observed in such zones. Some places present split positive 
anomalies, for example, places located on structure D, indicating twin core 
fires or non-contemporaneous fires. The main measured dipole is located 
to the west of structure B at the border of a zone covered by a calcite layer 
and near a char concentration zone, which suggests the occurrence of a 
hearth underneath the flowstone. Some visible heated zones did not reveal 
any magnetic anomaly, indicating that the substratum at these places was 
heated below 200-250 °C. The most tenuous dipoles located on the flat 
ground surface may reflect the changing nature of the substratum, rather 
than any heating. Indeed, the weak magnetic contrast between clay material 
and calcite material can be the source of a weak anomaly. An alternative 
explanation is the presence of a heated zone underneath a thick stalagmitic 
floor, the distance between source and measurement mitigating the 
anomaly**, For example, an anomaly located at midway between structures 
Band C. Complementary analysis of the spatial distribution of the clay 
material must be realized to determine which hypothesis is correct. 
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Extended Data Figure 6 | Burnt bone fragments. Three black fragments 
(a, b, c) were analysed with a scanning electron microscope energy 
dispersive spectrometry probe (SEM-EDS) (e, f), fast Fourier infra-red, 
FTIR (d) and Raman spectrometry (g, h, i). FTIR analyses were made at 
the Laboratoire de Physique des Solides (LPS), Paris-XI University, Orsay 
by S. Mariot on a Nicolet iS50 ABX spectrometer. Raman spectroscopy 
was performed with an Invia spectrometer from Renishaw and the atomic 
spectrometry was performed with a FE-SEM Zeiss Sigma equipped with 
an EDS probe at the Ecole Normale Supérieure, Paris, France. a, A 6.7-cm- 
long piece of burnt bone (Br-SE-Os) trapped between stalagmite elements 
in structure E (Extended Data Fig. 5) was almost completely covered by 
calcite except on its medullar side. Three layers were sampled for uranium- 
series dating (green, red and blue marks) (Extended Data Table 2). The 
bone with the 5-mm-thick calcite crust was cut longitudinally and the 
calcite was sampled along deposition layers, starting at the internal surface 
after removing the bone material. Three thin discontinuities marked by 
thin brownish layers separate the deposits into three calcite layers from 
which three *?°Th samples were taken (Extended Data Table 2). Except the 
middle sub-sample, which was contaminated by detrital elements (high 
?32T'h concentration), **°Th ages given by the other two sub-samples are in 
stratigraphic order and in agreement with the age of the structures. This 
demonstrates that humans introduced this bone before 180.9 + 20.3 ka. 
Note the elongated medulla cells of the bone and their deep black colour, 
suggesting that the collagen was carbonized at a temperature between 

300 and 400 °C****. Note that the burnt bone was covered by a reddish 

and blackened speleofact (Extended Data Fig. 5), due to the heat. d, FTIR 
spectroscopy (blue spectrum on the black part of the bone, green spectrum 
on the grey part of the bone, red spectrum on the overlying calcite crust 
and grey spectrum on a modern char) show well-characterized PO. 
absorbance peaks, suggesting that the bone was burnt; such as the slightly 
more individualized peak at ~618 cm '; and the splitting factor (SF) 


calculated with the heights of the 603 and 565 cm! peaks, which are 

here relatively high (4.6 to 4.8) and typical of burnt bones*”. 

g, Raman spectrometry displays two well-defined peaks at 1,580 cm~' and 
at 1,350cm_', characteristic of char, demonstrating that it was burnt**”. 
b, Sample Br-SB7 is a 3 mm large black fragment found trapped in the core 
of Br-stm-SB7 (Fig. 2). This fragment is situated just below the base of the 
regrowth dated to 175.2 + 0.8ka, and just above the ancient surface of the 
‘old’ stalagmite (whose layers have been dated to 222.4+ 5.8ka). h, Raman 
spectra of this black fragment display two well-defined peaks at 1,580cm~! 
and at 1,350cm™! characteristic of char carbon*”*”. e, SEM-EDS shows 
the presence of phosphorous, in addition to carbon, suggesting that it is 

a burnt bone fragment, similar to the larger bone piece (a). Because it is 
trapped in the dated calcite core, it also demonstrates that the fire occurred 
before 175.2 +0.8 ka. c, A black aggregate of millimetre-sized fragments 
(Br-PS92), mainly burnt bones of 1-3 cm was collected in 1992 by 

F. Rouzaud in the char concentration zone near structure B (Extended 
Data Fig. 5), and analysed recently. i, As with the previous samples, the 
Raman spectrum is typical of char carbon with vibrational bands at 
1,580cm7! and 1,350cm™!. f, The SEM images (back scattered mode) 
show a blend of at least three phases at the micrometre scale. The 
elemental analyses performed by EDS on each of these phases allow their 
attribution to a carbonaceous component (the EDS spectrum shows 

a major peak of carbon), a phosphorous component (the three major 
peaks (Ca, P and O) strongly suggest a phase belonging to the apatite 
family), and a clay component (attested by the coexistence of the three 
major peaks Si, Al, O), respectively. The Raman spectra demonstrate that 
the carbonaceous component is a char*®?, that is, a carbonaceous solid 
resulting from the heat treatment of an organic precursor. These results 
confirm that the char concentration zone near structure B was most 
probably a hearth, and that humans burned bones on the clay-like soil of 
the cave. 
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Extended Data Figure 7 | Calcite core stable isotope results. a, b, Stable climatic changes or to detrital contaminations, which are probably close 
isotope measurements (calcite §'O and §'°C) were made on parts of cores _to the discontinuity at the base of the regrowths (b). Higher resolution 
extracted from the structure to check the coherency of the isotope signal measurements combined with more uranium-series dating will allow 
with an already published time series from speleothems from the Villars the construction of short palaeoclimatic time series and more detailed 
Cave (Dordogne)*°, located 100 km to the northwest of Bruniquel Cave. observations of climatic variations. Today, the Structure Chamber has an 
The results reveal a good match between the average 6'O of regrowths extremely stable temperature of 12.68 + 0.02 °C (two times the standard 
after 176 ka and the Vil-carl1 flowstone stable isotopes. This is also true deviation of the temperature values measured during one year with a time 
for the sample that covers marine isotope stage 5e, with a much lower step of 1.5h) compared to the outside temperature over the same period 
amplitude change, however. The Bruniquel core 5!3C signal appears more (13.2 + 8.8°C). These results indicate the current confinement of the cave 
variable, possibly due to a greater sensitivity of the vegetation density to environment, important for isotopic studies. 
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Extended Data Figure 8 | Human appropriation of the underground 
environment: above, the specific task sequence in Bruniquel Cave (a). 
Below, replacement within the general context of various indicators 

of modern behaviour (b). a, Chaine opératoire (task sequence) of the 
construction of the structures in Bruniquel Cave. This type of construction 
implies the beginnings of a social organization: this organization could 
consist of a project that was designed and discussed by one or several 
individuals, a distribution of the tasks of choosing, collecting and 
calibrating the speleofacts, followed by their transport (or vice versa) 

and placement according to a predetermined plan. This work would also 
require adequate lighting. The construction of such a structure, involving 
the placement and arrangement of speleofacts, supposes a minimum 
degree of skill, since architectural techniques such as inserting wedging 
elements between two rows of speleofacts (Extended Data Fig. 2d, e), 


or placing stays to act as a buttresses (Extended Data Fig. 2c), appear 

to have been used. We evaluated the number of speleofacts used 
(approximately 400), as well as their combined weight (between 2.1 

and 2.4 tons), but not yet the number of hours necessary to realize the 
structures. This will require long and complex experimental procedures 
that will be undertaken in future research. The complexity of the structure, 
combined with its difficult access (335 m from the cave entrance), are signs 
of a collective project and therefore suggest the existence of an organized 
society that was already on the path to ‘modernity’. Until now, no site of 
this age, attributed to Neanderthals—even late ones—or early modern 
humans has been associated with such activities in an underground space. 
b, A multiple species model for the origin of behavioural modernity 

in Europe. Modified from ref. 15, to which was added the ‘Deep Cave 
Occupation’ and ‘Bruniquel Cave. 
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Extended Data Table 1 | Speleofacts: definition and archaeometry 


Structures 
Speleofacts Total 
A B Cc D E F 
Number of speleofacts 267 49 9 53 15 6 399 
Number 
% | total 66.92% 12.28% 2.26% 13.28% 3.76% 1.50% 100 
Total length of the speleofacts (m) 83.74 13.95 3.28 11.17 414 1.40 117.68 
Length (m) % / total 71.16% 11.85% 2.79% 9.50% 3.52% 1.19% 100 
Average length 31.7 28.5 36.4 21.5 27.6 23.3 29.8 
high estimate 1,771.28 280.28 7797 164.65 63.45 27.42 2,385.04 
Weight (kg) 
low estimate 1,581.50 250.27 69.62 147.01 56.65 24.48 2,129.52 
Weight (kg) 
% | total 74.27% 11.75% 3.27% 6.90% 2.66% 1.15% 100 
Average weight 6,251 5,107.5 8,702.4 3,127.8 4,046.3 6120 
© "root" (extraction) 16.50 15.48 28.50 13.94 16.50 = 
Diameter @ (cm) maximum @ 9.41 8.91 11.17 8.54 7.95 9.33 
minimum © 7A7 7.16 9.02 7.14 6.80 8.87 


A speleofact is defined as any element extracted from a speleothem (stalagmite, stalactite, drapery, flowstone, stalagmitic column, etc.) with the intent to use it for a precise purpose, thus removing it 
from its original formation location. This use is linked to a human activity, such as in the realization of any type of modification or construction, use as a utensil or for decoration, or for any other 
purpose. From the moment it is collected, the element in question attains a status that is distinct from its natural formation context, whether or not it is transformed by flaking, shaping, retouching, 
striking, engraving, painting, etc. Speleothems that have been clearly worked (flaked, shaped, retouched, pecked, engraved, painted, etc.) while remaining in situ should also be included in this definition. 
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Extended Data Table 2 | Speleothem 2°°Th dating results 
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Gore Number Sample 10] 2Th Th JT &4U* 2TH / oe ?8°Th Age (ky) °8°Th Age (ky) SU iia «= Th Age (ky BP)*** 
Number (ppb) (ppt) (atomic x 10°) (measured) (activity) (uncorrected) (corrected) (corrected) (corrected ) 
Speleothems in the structures (‘speleofacts’) 
BR-stm-SA249 2491 205.4 + 0.3 16,704 + 335 25645 466.24+2.4 1.2620 + 0.0028 178.5 + 1.2 177.1415 769 +5 177.141.5 
BR-stm-RA62 621 90.5 +0.1 14,623 + 293 146+3 898.74+2.8 1.4299+ 0.0032 130.6 + 0.6 128.6 + 1.6 1,292 +7 128.5 + 1.6 
BR-stm-RA62 622 154.9 + 0.2 30,279 + 607 125+3 593.642.3 1.4766 + 0.0027 206.8 + 1.3 204.0 + 2.3 1,056 + 8 203.9 + 2.3 
BR-stm-RA62 623 133.4 + 0.1 4037 + 81 811216 583.5+1.4 1.4885+0.0019 215.0 + 0.9 214.6 + 1.0 1,069 +4 214.6 + 1.0 
BR-stm-RA62 624 98.9 + 0.1 1131 +23 2,309+46 573.1+1.7 1.6008+ 0.0019 273.6 + 1.7 273.5 41.7 1,240 +7 273.4 + 1.7 
BR-stm-SB7 71 330.840.8 150,823 + 3,036 5041 462.043.3 1.3907 + 0.0044 229.8 + 2.7 222.5 45.8 866 + 15 222.4 + 5.8 
BR-stm-SB7 72 134.84+0.2 242,876 + 4,870 1341 306.0+2.5 1.4098+0.0086 4844+ 40.1 456.8+38.4 1,111+122 456.7 + 38.4 
BR-stm-SB7 73 68.2 + 0.1 1,939 + 39 809+16 612.94+1.7 1.3954+ 0.0021 175.7 + 0.7 175.3 + 0.8 1,005 + 4 175.2+0.8 
BR-stm-RB7 74 95.1 +0.1 30,522 + 611 70+1 548.341.8 1.3574 + 0.0028 183.0 + 1.0 178.0 + 3.7 906 + 10 177.9 43.7 
BR-stm-RB7 75 246.2 + 0.3 31,548 + 632 17624 401.941.9 1.3684+ 0.0019 254.6 + 1.8 252.5 + 2.3 820 +7 252.5 + 2.3 
BR-stm-RB7 76 219 +0.2 45,447 + 910 106+2 267.54+1.6 1.3345 + 0.0020 410.1 47.3 406.8 + 7.5 843 + 18 406.7 + 7.5 
BR-stm-RB7 77 133 + 0.2 9+1 48,099+2,907 801.2+1.8 0.2040+ 0.0012 13.0 + 0.1 13.0 + 0.1 831 12.9 + 0.1 
BR-stm-SA59 59 224.8 + 0.2 744415 88284178 922.1+1.8 1.7723+0.0020 193.5 + 0.7 193.4 + 0.7 1,592 +4 193.4 + 0.7 
BR-stm-RB23 231 92.5+0.1 1,595 + 32 1241+25 401.641.7 1.2991 +0.0017 217.94 1.2 217.7 + 1.2 74244 217.6 + 1.2 
BR-stm-RB23 232 87+0.1 4,889 + 98 421+8 688.142.1 1.4422 + 0.0026 169.1+0.8 168.3 + 1.0 1,106+5 168.3 + 1.0 
BR-stm-RB23 233 147 +0.2 34,055 + 683 9142 433.641.7 1.2724 + 0.0022 193.1 + 1.0 189.2 + 3.0 740 +7 189.1 + 3.0 
BR-stm-RB23 234 76+0.1 157+3 10,295+209 409.7+1.6 1.2859+ 0.0019 208.2 + 1.1 208.2 + 1.1 737 +4 208.1 + 1.1 
Flowstone inside Structure A 
BR-PL-P13 13 163.2 + 0.2 82,660 + 1,655 45+1 564.0+1.8 1.3757 + 0.0034 183.8 + 1.2 175.94 5.7 927+15 175.9 + 5.7 
Stalagmites on the collapsed rocks in the entrance zone 
BR-stm-3 31 41.8+0.1 2,780 + 56 226+5 203.2+1.5 0.9104+ 0.0020 144.5+0.8 143.0 + 1.3 304 +3 142.9 + 1.3 
BR-stm-3 32 15.7 +0.1 14,278 + 286 1721 139.143.7 0.9476 + 0.0080 181.2441 157.1 +17.9 217+ 12 157.0 + 17.9 
BR-stm-2 21 32.5 +0.1 21,647 + 434 2541 131.7+2.8 0.9959 + 0.0050 211.04 3.6 194.0 + 12.6 228 +9 193.9 + 12.6 
BR-stm-2 22 28.1 +0.1 13,737 + 275 34+1 143.0426 0.9975+ 0.0035 204.9 + 2.5 192.9 + 8.9 246 +8 192.8 + 8.9 
Flowstone on the collapsed rocks at the beginning of the main gallery 
BR-PL-1 1 68.5 + 0.1 4,066 + 81 316+6 132.941.4 1.1389+ 0.0017 367.4 + 5.8 366.1 + 5.8 373 +7 366.0 + 5.8 
Calcite on the burnt bone 
Calos Calos-1 136 + 0.1 210,059 + 4,206 1540 497.7+1.9 1.3792 + 0.0045 208.2 + 1.9 180.9 + 20.3 829 + 48 180.9 + 20.3 
Calos Calos-2 117+0.2 500,389 + 10,028 6+0 467.3+2.0 1.5061 +0.0102 296.348.4 198.6+9.5 x 10” 818 + 504 198.5 + 9.5 x 107" 
Calos Calos-3 50 + 0.1 55,973 + 1,122 1640 463.74+2.4 1.08344 0.0044 132.3 41.1 110.6 + 15.8 634 + 28 110.6 + 15.8 


The table shows the dating of stalagmites used to build the structures (speleofacts), those of the stalagmites that grew on the structures (regrowths, in darker lines), the flowstone inside the main 

structure A, and the calcite on the entrance collapse. The dating results of the calcite, deposited on the burnt bone found in the structures, are shown in the last lines. One date was rejected (Calos-2) 
due to its high uncertainty. Corrected 22°Th ages assume the initial 22°Th/232Th atomic ratio of 4.4+2.2 x 10-®, which is the value for a material at secular equilibrium, with the bulk earth 232Th/238U 
value of 3.8. Errors are arbitrarily assumed to be 50%. Age uncertainties are given as 20. 
p.p.b., parts per billion, 1 x 10~°; p.p.t., parts per trillion, 1 x 10~1*; BP, before present, with the present defined as 1950 ap. 
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A shared neural ensemble links distinct contextual 
memories encoded close in time 


Denise J. Cai!*, Daniel Aharoni>?**, Tristan Shuman?**, Justin Shobe!*, Jeremy Biane*, Weilin Song', Brandon Weil, 
Michael Veshkini!, Mimi La-Vul, Jerry Lou*’, Sergio E. Flores?’, Isaac Kim!, Yoshitake Sano!, Miou Zhou!, Karsten Baumgaertel®, 
Ayal Lavi!, Masakazu Kamata’, Mark Tuszynski*, Mark Mayford®, Peyman Golshani?? & Alcino J. Silva! 


Recent studies suggest that a shared neural ensemble may link 
distinct memories encoded close in time!!?. According to the 
memory allocation hypothesis”, learning triggers a temporary 
increase in neuronal excitability'*-'> that biases the representation 
of a subsequent memory to the neuronal ensemble encoding the first 
memory, such that recall of one memory increases the likelihood 
of recalling the other memory. Here we show in mice that the 
overlap between the hippocampal CA1 ensembles activated by two 
distinct contexts acquired within a day is higher than when they 
are separated by a week. Several findings indicate that this overlap 
of neuronal ensembles links two contextual memories. First, fear 
paired with one context is transferred to a neutral context when 
the two contexts are acquired within a day but not across a week. 
Second, the first memory strengthens the second memory within 
a day but not across a week. Older mice, known to have lower CAI 
excitability!*!°, do not show the overlap between ensembles, the 
transfer of fear between contexts, or the strengthening of the second 
memory. Finally, in aged mice, increasing cellular excitability and 
activating a common ensemble of CA1 neurons during two distinct 
context exposures rescued the deficit in linking memories. Taken 
together, these findings demonstrate that contextual memories 
encoded close in time are linked by directing storage into 
overlapping ensembles. Alteration of these processes by ageing could 
affect the temporal structure of memories, thus impairing efficient 
recall of related information. 

Contextual memories are encoded in discrete and sparse populations 
of neurons in the hippocampus!”~!. Recent findings demonstrated 
that increasing the relative neuronal excitability of a subset of neu- 
rons increases the probability that those neurons will participate in a 
memory trace®*!!, While previous studies used viral vectors to manip- 
ulate excitability, temporary increases in excitability occur naturally 
following learning, including in the hippocampus!**”. Therefore, two 
distinct memories could be linked across time because the temporary 
increase in excitability would bias the storage of a subsequent memory 
to many of the same neurons that encoded the first memory, such that 
recall of one of these events would also probably lead to recall of the 
other, a key prediction of the memory allocation hypothesis’”. 

To investigate the neuronal ensembles encoding multiple memories, 
we constructed an open-source, head-mounted, miniature fluorescent 
microscope” , to image in vivo calcium transients in CA1 neurons 
using GCaMPé6f. With this approach we tracked the activation of the 
same neurons in mice as they freely explored three distinct novel con- 
texts across multiple days (Fig. la—c, Extended Data Figs 1 and 2). We 
recorded CA1 neurons activated by three different contexts separated 


by either 5h or 7 days. Previous studies show transient learning- 
dependent increases in neuronal excitability'>!*™4 and we confirmed 
that 5h after context exposure there was an increase in excitability in 
CA1 neurons that encoded the context (Extended Data Fig. 3c, d). 
Therefore, we predicted that the overlap between the neural representa- 
tions of two contexts separated by 5h would be higher than the overlap 
of the neural representations of two contexts separated by 7 days. 

We exposed mice to three distinct, novel contexts. A and C were 
separated by 7 days; B and C were separated by 5h. Using miniature 
microscopes, we imaged active CA1 neurons during each context 
exploration (Fig. 1d). We found more overlap between the neural 
ensembles encoding B and C, spaced 5h apart, than between the neural 
ensembles encoding A and C, spaced 7 days apart (Fig. 1f, Extended 
Data Fig. 4a, b). Notably, this difference was not due to differences in 
the total number of active CA1 cells in the three contexts (Fig. le). We 
confirmed these findings with the TetTag transgenic system, a non- 
invasive technique that allowed us to tag neurons active during the 
exploration of two contexts?>6 (Fig. 2a, b, Extended Data Fig. 3a, b). 
We used this transgenic approach to tag the neural ensemble activated 
by exploration of an initial novel context (GFP*) and compared this 
population to the ensemble activated by exploration of a second dis- 
tinct, novel context (using ZIF immunohistochemistry), either 5h 
or 7 days later (Fig. 2c-e). When the two contexts were separated by 
7 days, the overlap between the two ensembles was similar to what was 
expected due to chance (Fig. 2f), indicating that independent popula- 
tions of neurons encoded the two distinct contexts. However, when the 
two contexts were separated by 5h, overlap between neuronal ensem- 
bles was significantly above chance levels and higher than in the 7 days 
group (Fig. 2f). Together, the calcium imaging and TetTag data provide 
converging evidence that overlapping neural ensembles encode distinct 
contexts when these contexts are separated by 5h, but not by 7 days. 

To determine whether the overlap of neuronal representations link 
contextual memories that occurred close in time, such that the recall 
of one is more likely to lead to the recall of the other, we again exposed 
animals to three distinct contexts as described above: A and C were 
separated by 7 days, and B and C were separated by 5h. Two days later, 
mice were placed in C and given an immediate footshock (Fig. 3a). 
Since the neural representations of B and C overlap more than A and 
C (Extended Data Fig. 5), recall of C (shocked context) should lead to 
recall of B (but not A). Therefore, the fear associated with C should 
transfer to B (but not to A). Remarkably, we found that mice tested 
in B, a context in which they had not been shocked, froze as much as 
mice tested in C (shocked context; Fig. 3b). In contrast, mice tested in 
A froze significantly less than mice tested in the other two contexts. 


1Departments of Neurobiology, Psychiatry & Biobehavioral Sciences and Psychology, Integrative Center for Learning and Memory, Brain Research Institute, University of California, Los Angeles, 
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Angeles, California 90095, USA. West Los Angeles VA Medical Center, 11301 Wilshire Blvd, Los Angeles, California 90073, USA. “Department of Neurosciences, University of California, San 
Diego, La Jolla, California 92093, USA. 5Veterans Affairs Medical Center, San Diego, California 92161, USA. (Departments of Cell Biology and Neurosciences, Institute for Childhood and Neglected 
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Figure 1 | Calcium imaging CA1 with integrated miniature microscopes 
while exploring different contexts. a, A microendoscope was implanted 
directly above CA1 expressing viral GCaMP6f and a baseplate was affixed 
onto the skull. A miniature fluorescent wide-field microscope was used 

to image CA1 neurons across repeated imaging sessions. b, Top left, 
example image of mean fluorescence during context exploration. Top 
right, example image of relative fluorescent change (AF/F). Bottom left, 
cells extracted from AF/F. Scale bar represents 100 1m. Bottom right, 
example traces of AF/F colour coded to represent individual neurons. 

c, Experimental design. Mice were imaged while exploring three novel 
contexts (A, B, C) separated by 7 days or 5h. d, Representative imaging 
during context exploration. Top row, images of mean fluorescence from 
each session. Middle row, ensemble of cells active in each session. Bottom 
row, cells that were active in two sessions. Scale bar represents 100 jim. 

e, There was no difference in the number of cells active across the three 
context explorations (one-way, repeated measures ANOVA, F>,7 = 2.14, not 
significant, n = 8 mice). f, There was an increase in the overlapping ensemble 
when contexts are separated by 5h compared to 7 days (paired t-test, 

t7 = 3.830, P=0.0065, n=8). **P< 0.01. Results show mean + s.e.m. 


These results support the hypothesis that the overlap between neuronal 
representations contextually links memories close in time. 

Next, we tested whether the memories for B and C remain distinct, 
rather than forming a unitary memory. If so, extinction of the fear 
associated with B should not affect recall in C. Again, we exposed 
animals to B and 5h later to C, and then two days later paired C with 
a footshock. Two days after the footshock, the mice were tested in 
either C (shocked context), B (5h; not shocked), or D (novel context; 
Fig. 3c). Consistent with the prior experiment, mice froze similarly in 
C and B, despite never having been shocked in B. However, they froze 
less in a novel context (D; Fig. 3d, Extended Data Fig. 6b), demon- 
strating memory specificity. Next, we carried out repeated exposures 
in either context C, B, or D daily for 5 days. On the final day, the mice 
were tested in C (shocked context). As expected, repeated exposures 
in C (compared to repeated exposures in novel context D) resulted 
in lower freezing during the extinction test (Fig. 3e). Mice that were 
repeatedly exposed to B did not show less freezing in C, demonstrating 
that repeated exposures in B do not cause extinction in C. These results 
demonstrate that although the memories for B and C show consid- 
erable overlap in their ensembles, and recall of B appears to trigger 
recall of C, memories for these two contexts, acquired 5h apart, remain 
distinct. 
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Figure 2 | Tagging neural ensembles of contextual memories with 

the TetTag system. a, Schematic design of the TetTag system. Dox, 
doxycycline; tTA, tetracycline-transactivator; TetO, tetracycline response 
element. b, Experimental design. Cells active in context A were tagged 
with GFP and cells active in context B, either 5h or 7 days later, were 
labelled with ZIF immunohistochemistry. c, Representative examples of 
GEP, ZIF, DAPI and merged images of CA1. Scale bar represents 50 jum. 
d, There was no difference between the percentage of cells positive for GFP 
(unpaired t-test, t4 = 0.54, not significant, n = 15, 11 mice). e, There was 
no difference between the percentage of cells positive for ZIF (unpaired 
t-test, to4= 1.11, not significant, m = 15, 11 mice). f, There was an increase 
in the overlapping ensemble between contexts when spaced 5h apart 
compared to 7 days apart (unpaired f-test, to, = 2.15, P=0.0422, n= 15, 
11 mice). The level of the overlapping ensemble for the 5h group was 
above chance (one-sample t-test against 0, t)4 = 3.402, P= 0.0043) and at 
chance for the 7 day group (one-sample t-test against 0, ft) = 0.323, not 
significant). *P < 0.05. Results show mean +s.e.m. 


Recent findings demonstrated that manipulations that enhance 
neuronal excitability can lead to increases in memory strength''. We 
found that 5h after exposure to a context, there was an increase in 
excitability in cells that encoded that context (Extended Data Fig. 3c, d). 
Thus, the sharing of the neural ensemble and the increase in excitability 
should result in the strengthening of the memory for a second context 
5h later. To test for modulation of memory strength, mice were exposed 
to B and then exposed to C 5h or 7 days later. Two days later, animals 
received an immediate shock in C. Two days after that, they were tested 
in C. Home cage controls were trained in the same manner, except they 
were not exposed to B (Fig. 3f). Mice trained with the 5h interval had 
enhanced memory for C compared to either mice trained with the 7 day 
interval or home cage controls (Fig. 3g; Extended Data Figs 6c, dand 7). 
Furthermore, this enhancement required NMDA-receptor activity 
(Extended Data Fig. 8). These data support our previous findings and 
indicate that for a period of time (5h, but not 7 days) the processes 
triggered by the encoding of one memory can modulate the strength 
of subsequent memories. 

Taken together, the results presented above demonstrate that the 
overlap between the neuronal ensembles representing two separate 
contextual memories leads to linking of these memories and suggests 
that excitability has a key role in this process. Since CA1 neuronal 
excitability decreases with ageing'*'®*”’, we predicted that memory- 
linking processes may be disrupted in older mice. To test this, we 
started by repeating the calcium imaging (Fig. 4a) as well as the TetTag 
experiment (Extended Data Fig. 9e, f) in aged mice. Unlike in young 
adult mice (3-6 months old), in aged mice (14-18 months old) there 
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Figure 3 | Memories are contextually linked but distinct. a, Design for 
transfer of fear experiment. Imm. shock, immediate shock; cxt test, context 
test. b, There was a significant difference in freezing between groups 

that were tested in different contexts (A, B, C) for the transfer of fear 
experiment (one-way ANOVA, F347 = 4.62, P=0.01, m= 18, 17, 15 mice). 
There was no difference between freezing in contexts C and B (ty; = 0.42, 
not significant). Animals had less freezing in context A than C (t47 = 2.46, 
P=0.02) and B (t47 = 2.83, P=0.007). c, Design for extinction experiment. 
d, There was a significant difference in freezing during the context test 
(one-way ANOVA, F357 = 12.99, P< 0.0001, m = 20, 20, 20 mice). There 
was no difference between freezing in contexts C and B (ts7 = 0.80, not 
significant). Animals had less freezing in context D than C (ts7 = 4.76, 


was no difference between the overlap of neural ensembles encoding 
contexts spaced 5h or 7 days apart (Fig. 4b). This lack of overlap was 
not due to an inability to reliably reactivate the same neural ensemble 
during recall of the same context (Extended Data Fig. 9a, b) or to gen- 
eral contextual memory deficits (Extended Data Fig. 9c, d). 

The results presented above predict that the lack of a shared neural 
representation in aged mice should disrupt memory linking. To test 
this hypothesis, we repeated in aged mice the experiment testing the 
transfer of fear between contexts (Fig. 4c). The results showed that the 
fear associated with C does not transfer to B in aged mice: the freezing 
triggered by B (no shock context) was not different than that observed 
in a novel context, D, and significantly lower than that in C (shocked 
context; Fig. 4d). Similarly, we found that, unlike in young mice, in 
aged mice exposure to B (5h before exposure to C) does not enhance 
memory for C (Fig. 4e, f). Importantly, this was not due to a deficit in 
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P<0.0001) and B (ts7 = 3.96, P= 0.0002). e, There was a significant 
difference in freezing during the extinction test (one-way ANOVA, 
Fy57=4.79, P=0.01, n= 20, 20, 20 mice). There were no differences in 
freezing between groups B and D (ts7= 0.81, not significant). Group C 

had less freezing than groups B (t57 = 2.18, P= 0.03) and D (ts7 = 2.99, 
P=0.004). f, Design for enhancement experiment. g, There was a 
significant difference in freezing in the enhancement experiment (one-way 
ANOVA, F),5; = 9.63, P< 0.001, n = 14, 20, 20 mice). The 5h group had 
more freezing than the home cage (HC) (ts; = 3.98, P=0.0002) and 7 day 
(ts; = 3.45, P=0.001) groups. There was no difference between home cage 
or 7 d groups (ts; = 0.86, not significant). *P < 0.05, **P < 0.01. Results 
show mean +$.e.m. 


HC Sh 7d 


learning of a single context, since when trained with a single context 
the performance of aged mice was indistinguishable from that of young 
mice (Extended Data Fig. 9c, d). Furthermore, the differences between 
young and aged mice were also not due to strain differences, as we 
replicated the transfer and enhancement experiments with young mice 
from the same genetic background as the aged mice (Extended Data 
Fig. 6). Altogether, these results strongly support the role of neuronal 
excitability in linking distinct contextual memories encoded close in 
time, as aged mice exposed to two contexts close in time did not show 
the increased overlap between ensembles which presumably led to the 
lack of both the transfer of fear between contexts and the strengthening 
of the second memory. 

To increase neuronal excitability and rescue the memory-linking 
deficit in aged mice, we injected a lentivirus to express hM3Dq designer 
receptors exclusively activated by designer drugs (DREADD) tagged 
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Figure 4 | Age-related deficits in memory linking are rescued by 
ensemble activation. a, Design for calcium imaging with miniature 
microscope in aged mice. b, There was no difference in the overlapping 
ensemble between the 5h and 7 day groups (paired t-test, t3 = 0.367, not 
significant, n = 4). c, Design for transfer of fear experiment. d, There 
was a significant difference in freezing during the context test (one-way 
ANOVA, F),47 = 8.083, P= 0.001, n= 19, 15, 16 mice). There was no 
difference between freezing levels in contexts B and D (ts7 = 0.35, not 
significant). Animals had more freezing in context C than B (t4= 3.19, 


P=0.0025) and D (t47 = 3.619, P=0.0007). e, Design for behavioural 
enhancement experiment. f, There was no difference in freezing 
between groups (one-way ANOVA, F) 39 = 0.453, not significant, n = 15, 
15, 12 mice). g, Design for memory linking rescue by activating cells 
with DREADD receptors. h, There was higher freezing in the CNO 
group compared to the saline-injected (SAL) group (unpaired t-test, 

t3; = 2.36, P=0.02, n= 12, 21 mice). *P< 0.05, **P< 0.01. Results show 
mean — S.e.m. 
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with GFP in a sparse population of dorsal CA1 neurons (Extended 
Data Fig. 10a, b). Clozapine-N-oxide (CNO) increases excitability and 
activates cells that express the DREADD receptors!! (Extended Data 
Fig. 10c, d). To bias the allocation of the two contextual memories so 
that they shared an overlapping neural ensemble, we injected CNO 
before both learning experiences, spaced 5h apart (Fig. 4g). The control 
group was given a saline injection before the first exploration and a 
CNO injection before the second exploration. To test the behavioural 
consequences of sharing a neural ensemble, mice were brought back 
two days later for an immediate shock in the second context. Two days 
later, mice were tested in the first (non-shocked) context to assess 
their transfer of fear. The CNO group froze more than the saline- 
injected group in the non-shocked context (Fig. 4h). This was not due 
to increased anxiety caused by CNO (Extended Data Fig. 10e, f). Thus, 
increasing neuronal excitability in aged mice rescued the memory- 
linking deficit. 

Mechanisms that link memories are critically important for organ- 
izing the enormous number of related memories stored throughout 
a lifetime. Our results support the memory allocation hypothesis!” 
and are consistent with human data and computational modelling”®, 
suggesting that memories encoded within close temporal proximity 
are more likely to be co-recalled than memories encoded across more 
distant time frames. Our data indicate that overlapping populations of 
CAI neurons serve to link and strengthen memories, thus facilitating 
integrated recall of experiences encoded close in time while separating 
those encoded further in time. Temporary increases in excitability'?"!° 
probably represent one of a family of mechanisms (synaptic tagging 
and capture””? is another example) that structure the acquisition and 
storage of information to facilitate future use and recall. Alteration 
of these processes, such as decreases in neuronal excitability during 
ageing, could affect the organization of memory thus impairing 
efficient recall of related information. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Subjects. All experimental protocols were approved by the Chancellor’s Animal 
Research Committee of the University of California, Los Angeles, in accordance 
with NIH guidelines. Adult C57Bl/6NTac, C57Bl/6NTac x 129S6/SvEvTac and 
C57BI/6NIA male mice were singly housed on a 12h light/dark cycle. Young adult 
mice were 3-6 months old, and aged adult mice were 14-18 months old. TetTag mice 
were generated by crossing transgenic mice that express a histone 2B-GFP fusion 
protein controlled by the tetO promoter (strain Tg(tetO-HIST1H2BJ/GEP) 47Efu/J; 
stock number 005104; Jackson Laboratory) with mice that express tetracycline- 
transactivator (tTA) protein under control of the c-fos (also known as Fos) pro- 
moter. TetTag mice were maintained in a C57BL/6N background. Mice were born 
and raised on doxycycline (dox) chow (40 mg kg!) to prevent GFP expression 
before experimental manipulations. To open the window for activity-dependent 
labelling, dox chow was replaced with regular chow for 3 days before the start of 
an experiment. Expression of new GFP was shut off by administration of high dox 
chow (1g kg~!). Memory linking (transfer of fear and enhancement) experiments 
were conducted with both C57Bl/6NTac x 129S6/SvEvTac and C57BI/6NIA mice. 
Viral construct. AAV 1.Syn.GCaMP6f.WPRE.SV40 virus (titre: 4.65 x 10'? GC per ml) 
was purchased from Penn Vector Core. The hM3Dq vector was derived from the 
CaMK2a.hM4Di.T2A.EGFP/CREB plasmid*°. The hM4Di.T2A.EGFP/CREB 
in that plasmid was replaced by hM3Dq.T2A.EGFP/dTomato. The HA-tagged 
hM3Dgq and dTomato-tagged EGFP are expressed under the CaMK2a promoter 
and cloned on either side of a T2A self-processing viral peptide. Vesicular- 
stomatitis-virus-G-protein-pseudotyped lentiviral vectors were produced by 
calcium-phosphate-mediated transient transfection of human embryonic kidney 
293 T (HEK293T) cells, as previously described. Lentivirus vectors were titred on 
HEK293T cells based on EGFP expression (titre: 6 x 10° cells per ml). 

Surgery. Mice were anaesthetized with 1.5 to 2.0% isoflurane for surgical pro- 
cedures and placed into a stereotactic frame (David Kopf Instruments, Tujunga, 
CA). Lidocaine (2%; Akorn, Lake Forest, Illinois) was applied to the sterilized 
incision site as an analgesic, while subcutaneous saline injections were admin- 
istered throughout each surgical procedure to prevent dehydration. In addition, 
carprofen (5 mg kg!) and dexamethasone (0.2 mg kg!) were administered both 
during surgery and for 7 days post-surgery with amoxicillin. 

For calcium imaging experiments, mice underwent two separate surgi- 
cal procedures. First, mice were unilaterally microinjected with 500 nl of 
AAV1.Syn.GCaMP6f.WPRE.SV40 virus at 50n! min into the dorsal CA1 using 
the stereotactic coordinates: —2.1 mm posterior to bregma, 2.0 mm lateral to mid- 
line and —1.65 mm ventral to skull surface. Two weeks later, the microendoscope 
(a gradient refractive index lens) was implanted above the previous injection site. 
For the procedure, a 2.0mm diameter circular craniotomy was centred 0.5mm 
medial to the virus injection site. Artificial cerebrospinal fluid (ACSF) was 
repeatedly applied to the exposed tissue to prevent drying. The cortex directly 
below the craniotomy was aspirated with a 27-gauge blunt syringe needle attached 
toa vacuum pump. The microendoscope (0.25 pitch, 0.50 NA, 2.0mm in diameter 
and 4.79 mm in length, Grintech Gmbh) was slowly lowered with a stereotaxic 
arm above CAI to a depth of 1.35 mm ventral to the surface of the skull at the 
most posterior point of the craniotomy. Next, a skull screw was used to anchor the 
microendoscope to the skull. Both the microendoscope and skull screw were fixed 
with cyanoacrylate and dental cement. Kwik-Sil (World Precision Instruments) 
covered the microendoscope. Two weeks later, a small plastic baseplate was 
cemented onto the animal's head atop the previously formed dental cement. Debris 
was removed from the exposed lens with double-distilled HO, lens paper and 
forceps. The microscope was placed on top of the baseplate and locked in a position 
in which the field of focus was in view, so that cells and visible landmarks, such as 
blood vessels, appeared sharp and in focus. Finally, a plastic cover was fit into the 
baseplate and secured by magnets. 

For aged DREADD experiments, mice were bilaterally microinjected with 700 nl 
of lentivirus C€MK2.hM3Dq.T2A.EGFP/dTomato virus at 100 nl min“! into the 
dorsal CA1 using the stereotactic coordinates: — 1.80 mm posterior to bregma, 
+1.50 mm lateral to midline, —1.60 mm ventral to skull surface; —2.50 mm pos- 
terior to bregma, +2.00 mm lateral to midline, —1.70 mm ventral to skull surface. 
Drug injections. Clozapine-N-oxide (CNO; Enzo Life Sciences) was made in 
a stock solution of 0.5mg ml“! in DMSO and then diluted in saline to desired 
concentration. CNO was injected (i.p.) at a dose of 0.5mg kg”! 45 min before 
behavioural manipulation. MK-801 (Sigma-Aldrich) was diluted in saline and 
injected (i.p.) at a dose of 0.1 mg ml“! 30 min before behavioural manipulation. 
Saline was used as the vehicle. 

Behavioural procedures. Prior to all experiments, mice were handled for one 
minute in the vivarium each day for three days. Then, mice were habituated to 
transportation and external environmental cues by being carted out of the vivar- 
ium into the experimental rooms and handled for one minute in the experimental 
room each day for five days before the experiment. For within-subject experiments, 
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mice explored three different contexts, separated by 7 days or 5 hours. Exploration 
duration of each context was ten minutes (C57Bl/6NTac and C57BI/6NIA strains) 
or five minutes (C57Bl/6NTac x 129S6/SvEvTac strain). Contexts were counter- 
balanced. For between-subject experiments, mice explored two contexts sepa- 
rated by either 7 days or 5h. The area of each context was approximately 800 cm?. 
The shape (circular, triangular, square), scent (simple green, omega, alcohol), 
visual cues (white plastic walls/opaque textured flooring, black acrylic walls/ 
white acrylic flooring, metal walls/metal grid flooring) were different for each 
context. For immediate shock*! (imm shock), mice were placed in the cham- 
ber with a baseline of 10s (0.7 mA) (C57BI/6NTac and C57BI/6NIA strains) or 
6s (C57Bl/6NTac x 129S6/SvEvTac strain) followed by a 2s shock (0.7 mA, 
C57Bl/6NTac and C57BI/6NIA strains; 0.5 mA, C57Bl/6NTac x 129S6/SvEvTac 
strain). Thirty seconds after the shock, mice were placed back in their home cage. 
For context tests (cxt test), mice were returned to the designated context. For 
extinction (extinct) trials, mice were placed in a context for five minutes without 
shock. Freezing (the cessation of all movement except for respiration), was assessed 
via an automated scoring system (Med Associates) with 30 frames per second 
sampling; the mice needed to freeze continuously for at least one second before 
freezing could be counted*”°?, Experimental groups and contexts were counter- 
balanced across the within-subjects design. For between-subjects design, animals 
were randomly assigned to groups. 

Integrated miniature microscope data acquisition and analyses. Digital imaging 
data was sent from the CMOS imaging sensor (Aptina, MT9V032) to custom data 
acquisition (DAQ) electronics and USB host controller (Cypress, CYUSB3013) 
over a lightweight, highly flexible cable. The electronics packaged the data to com- 
ply with the USB video class (UVC) protocol and then transmit the data over 
Super Speed USB to a PC running custom DAQ software. The DAQ software was 
written in C++ and uses Open Computer Vision (OpenCV) libraries for image 
acquisition. Images are acquired at 30 frames per second and recorded to uncom- 
pressed .avi files. The DAQ software simultaneously records animal behaviour, 
time stamping both video streams for offline alignment. 

Our analysis suite, written in MATLAB, processes the raw videos and extracts 
relevant experimental information. Initial processing of calcium imaging data 
corrected column-wise ADC variation, removed small movement artefacts using 
an amplitude-based image registration algorithm, and calculated the mean flu- 
orescence per pixel for conversion to AF/F. A fully automated segmentation 
algorithm identified and segmented pixels of active cells. The algorithm steps 
through the recorded calcium imaging video detecting pixel locations of local 
maxima of fluorescence which met a minimum AF/F criteria. For each of these 
pixel locations, an iterative process was used to group together neighbouring 
pixels based on that pixel’s fluorescence time trace (5s window around local 
maxima of fluorescence event) correlation with the mean time trace of the pixels 
group in the previous iterative step. Pixels with high correlation (0.95) were added 
to the group and the process was repeated until the total number of pixels in the 
group no longer changed. Cells whose centres were within 7 1m of each other or 
whose pixels overlapped by at least 80% were merged together. Once cells were 
segmented, we extracted AF/F traces and removed crosstalk between neighbour- 
ing cells. Crosstalk was removed by first detecting calcium transients across all 
cells and then keeping only the largest event within a 30 1m radius of the cell they 
were associated with’. Calcium events were calculated by first filtering the AF/F 
(2-pole Butterworth low-pass filter: 0.3 Hz) to remove noise. Peaks in the filtered 
AF/F trace above 0.05 AF/F were detected and a window was calculated from 
the onset of the peak to the return back to baseline. If this window was greater 
than one second, it was counted as an event. Recordings from multiple sessions 
of the same animal were aligned using the same amplitude-based registration 
algorithm used for within-session registration, except the algorithm was only 
applied to the mean frame from each session. Once two sessions were registered, 
cells across two sessions were matched to each other using a distance measure 
(centres within 51m of each other). 

Code availability. The MATLAB analysis suite, as described above, is available for 
download at http://www.miniscope.org. This Wiki site is our open-source platform 
for sharing access to all of our associated software and hardware files for imple- 
menting our miniature microscope. 

Confocal imaging and histological analysis. Forty-five minutes after exploration 
of a context, mice were transcardially perfused with 4% PFA, followed by 24h 
post-fixation in the same solution. Free-floating 50-j1m coronal sections were 
prepared using a vibratome. Sections were incubated in blocking solution containing 
0.2% Triton X, 10% normal goat serum in 0.1 M phosphate buffer for at least 1h 
at room temperature. Then the sections were incubated in the blocking solution 
with anti-EGR-1 rabbit primary antibody (Cell Signaling; 1:750 dilution for 24h at 
4°C). After a series of 0.1 M phosphate buffer washes, sections were stained using 
the same blocking solution as above and Alexa Fluor 568 goat anti-rabbit secondary 
antibody (Jackson Immuno Research; 1:500 dilution for 2h at room temperature). 
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Finally, sections were stained with DAPI (Invitrogen; 1:1,000 dilution for 15 min) 
and mounted on slides. 

Sections from —1.8mm to —2.2mm posterior to bregma were imaged at 20 x 

magnification using a Nikon C2 or Al confocal microscope. All imaging was done 
using standardized laser settings, held constant for samples from the same exper- 
imental data set. Cells were manually counted by a blinded rater. Images were 
quantified from 1-4 sections per animal. The percentage of DAPI-labelled cells 
containing GFP, ZIF, or both was calculated for each image and then averaged 
to produce a single measurement for each animal. To normalize for chance, we 
subtracted chance (GFP/DAPI) x (ZIF/DAPI) x 100 from the observed overlap 
(GFP and ZIF)/DAPI x 100 and then divided by chance. 
Electrophysiology. Mice were anaesthetized with a cocktail (3 ml kg!) containing 
ketamine (25 mg ml~’), xylazine (1.3mg ml-!), and acepromazine (0.25 mg ml!) 
and perfused for 3 min with ice-cold, oxygenated, sucrose ACSF containing 
(in mM) 83 NaCl, 2.5 KCl, 3.3 MgSOg, 0.5 CaCla, 1 NaH2POg, 26.2 NaHCO3, 
22 glucose, and 72 sucrose (~315 mOsmil, pH 7.4). The brain was rapidly dissected 
and 300-\1m-thick coronal slices were collected and transferred to an interface 
chamber containing the same modified sucrose ACSF solution and incubated 
at 34°C for 30 min. Slices were then held at room temperature (23 °C) in the 
interface chamber for at least 45 min before initiating recordings. Recordings 
were made in a submersion-type recording chamber and perfused with oxy- 
genated ACSF containing (in mM) 119 NaCl, 2.5 KCl, 1.3 MgCl, 2.5 CaCh, 
1.3 NaH2POq, 26.0 NaHCOs, 20 glucose (~295 mOsml) at 23°C at a rate of 
1-2 ml per minute. 

All recordings were performed within the CA1 region of the hippocampus. 
Neurons were selected based on emission spectra (GFP* or GFP”), and were 
then visualized under infrared differential interference contrast video microscopy 
(Olympus BX-51 scope and Rolera XR digital camera). Whole-cell recordings 
were made at room temperature using pulled patch pipettes (5-6 M{)) filled with 


internal solution containing (in mM) 150 K-Gluconate, 1.5 MgCl), 5.0 HEPES, 
1 EGTA, 10 phosphocreatine, 2.0 ATP, and 0.3 GTP. Recordings were obtained 
using Multiclamp 700B patch amplifiers (Molecular Devices) and data ana- 
lysed using pClamp 10 software (Molecular Devices). Data were acquired from 
cells requiring less than —100 pA to hold at a membrane potential of —70 mV. 
Current-spike relationship was determined with a series of depolarizing current 
steps applied for 500 ms in 10 pA increments at 5 s intervals. 

Statistical analysis. GraphPad Prism version 6.00 (GraphPad Software, La Jolla, 
California, USA) was used for statistical analyses. Statistical significance was 
assessed by two-tailed paired Student's t-tests, two-tailed unpaired Student's t-tests, 
one-way ANOVA, or two-way ANOVA where appropriate. Significant effects or 
interactions were followed up with post hoc testing with the use of Fisher’s least 
significant difference (LSD) where specified in the figure legends. Significance 
levels were set to P=0.05. Significance for comparisons: *P < 0.05; **P < 0.01; 
***P < 0.001. Sample sizes were chosen on the basis of previous studies. No sta- 
tistical methods were used to predetermine sample size. Data met assumptions of 
statistical tests, and variance was similar between groups for all metrics measured. 
The investigators were blinded to conditions and drugs during experiments and 
outcome assessment. 
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Extended Data Figure 1 | Stability of fluorescence and overlap. 

a, Average normalized mean fluorescence within session. There was no 
difference between the mean fluorescence across the 3 sessions (one-way 
repeated measures ANOVA, F>,7 = 0.423, not significant). b, Average 
normalized mean fluorescence within session. There was no difference 
between the mean fluorescence across a 10-min session (one-way repeated 
measures ANOVA, Fo,x2 = 1.108, not significant). Results show mean + s.d. 
c, Higher ensemble overlap with 5h interval than 7 days. Normalized 
ensemble overlap is calculated as the ensemble overlap between contexts 
separated by 5h divided by the ensemble overlap between contexts 
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separated by 7 days. A normalized overlap value of 1 signifies that there is 
no difference between the overlap at 5h and 7 days. The minimum number 
of calcium events required from each cell for the cell to be considered 
‘active’ (inclusion criteria) was systematically increased and the ratio of the 
ensemble overlap for the different context was calculated. For all inclusion 
criteria, there is higher ensemble overlap with a 5h, rather than 7 day, 
interval (one-sample t-test against 1, (1) t7 =3.00, P= 0.02, (2) t; =2.57, 
P=0.04, (3) t7 =2.42, P=0.04, (4) ty = 2.50, P=0.04, (5) t7 =2.32, 
P=0.05). Results show mean +5.e.m. 
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Extended Data Figure 2 | Neural ensembles of environments are reliably 
reactivated at recall of an open field and linear track. a, Experimental 
design. Mice were imaged while exploring contexts A and B separated 

by 7 days and imaged while exploring contexts C and C separated by 

7 days. b, There was a higher percentage of cells reactivated when animals 
explored the same context (C-C) than when animals explored different 
contexts (A-B) (paired t-test, t3 = 6.305, P=0.0081, n =4 mice). c, Mice 
were trained to run on a 2-m linear track with the miniature microscope 
for water rewards. Mice were trained 3 days a week for 3 weeks with 

a delay interval of 2-3 days between each session. Place fields were 
calculated by deconvolving calcium AF/F traces with an exponential 

to extract approximate spike times. Spikes that remained after crosstalk 
removal were included for analysis. Animal position was extracted using 
an automated LED tracking algorithm. A speed threshold (3 cm s_') was 
applied to both the animal position and extracted spike timing and the 
resulting data was spatially binned (6.5-cm bins). Spatial firing rates were 
calculated by dividing the binned spike counts by the binned occupancy 


and smoothing with a Gaussian filter (sigma = 6.5 cm). Cells which 
showed consistent spatial firing modulation on at least three trials, with 
all other trials showing no bursting activity, were considered as place 
cells. Normalized spatial firing rates of all matched cells independently 
meeting the place cell criteria for both days. The data are pooled across 

3 mice and include both motion directions. Place fields are ordered by 
centroid location on session 2. d, A shift of the image registration between 
sessions results in a decrease in matched place cells. A translational shift 
both horizontally and vertically was applied to the image registration 
transformation used in A. Cells were then matched across days and those 
which met our place cell criteria were kept. The heat map shows the 
count of matched place cells with a centroid shift of the place field that 

is less than 33 cm. Optimum matching of cells occurred within a 1-pixel 
translation of the calculated alignment transformation. e, Distribution of 
centroid shifts of place fields shown in A compared to the null hypothesis 
that the cell matching between sessions matches random cells. 
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Extended Data Figure 3 | Five hours after exploration of a context, 
GFP expression is shut off by doxycycline and excitability is increased. 
a, Experimental design. Mice were removed from low levels of dox 

(40 mg kg~') and given regular chow for 3 days to open up the GFP 
tagging window. After receiving administration of high dox (1 g kg’) 
for 5h, mice were injected with 30 mg kg! of pentylenetetrazole (PTZ), 
exposed to a novel context or left in their home cage (HC). An hour later, 
mice were transcardially perfused and processed for GFP expression. 

b, There was no difference in GFP expression between the three 

groups (one-way ANOVA, F>,5 = 0.04, not significant, n = 3, 3, 2 mice), 
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demonstrating that 5h was enough time for dox (1 g kg’) to suppress 
expression of new GFP. ¢, To test excitability learning-related excitability 
changes, mice explored a novel context and then were administered high 
dox to shut off new GFP. Five hours later, mice were euthanized for 

in vitro slice physiology. d, A two-way repeated measures ANOVA 

(group x current step) had a significant main effect of group (F263 = 4.20, 
P<0.05, n=21, 29, 21 cells). The 5h GFP* group had more spikes than 
the 5h GFP~ group (ts = 2.31, P< 0.05) and home cage GFP™ (tsg = 2.72, 
P<0.05). There was no difference between the 5h GFP~ and home cage 
GFP~ groups (tes = 0.61, not significant). Results show mean = s.e.m. 
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Extended Data Figure 4 | Time course for neuronal overlap and 
behavioural linking. a, Design for Ca”* imaging of neuronal overlap 
experiment. b, There was a significant difference in overlap across groups 
(one-way repeated measures ANOVA, F212 = 12.43, P= 0.002, n 
There was more overlap at 5h than 2 days (t;2 = 3.03, P=0.01) and 7 days 
(ti2=4.72, P=0.0005). c, Design for transfer of fear experiment. d, There 
was a significant difference in freezing across groups (one-way ANOVA, 


7 mice). 


F543 = 3.55, P= 0.04, n= 20, 14, 12 mice). There was more freezing at 5h 
than 2 days (t43 = 2.13, P=0.04) and 7 days (t43 = 2.31, P=0.03). 

e, Design for enhancement experiment. f, There was a significant 
difference in freezing across groups (one-way ANOVA, F,45 = 6.38, 
P=0.004, n = 22, 14, 12 mice). There was more freezing at 5h than 2 days 
(t45 = 2.45, P=0.02) and 7 days (ty5 = 3.32, P=0.002). Results show 
mean =r s.e.m. 
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Extended Data Figure 5 | Calcium imaging during retrieval. a, Design for Ca” imaging of neuronal overlap at retrieval. Order of contexts during 
retrieval was counterbalanced. b, There was higher overlap of the neuronal ensemble at 5h than 7 days (paired t-test, t7 =2.55, P=0.04, n=8 mice). 
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Extended Data Figure 6 | Replication of memory linking experiments 
in young (3-6 months old) C57BI/6NIA mice. a, Design for transfer of 
fear experiment. b, There was a significant difference in freezing across the 
groups (one-way ANOVA, F>,9) = 9.49, P= 0.001, n = 8, 7, 8 mice). There 
was no difference between freezing levels in context C or B (try) = 0.99, 

not significant). Animals had less freezing in context D than C (tz) = 4.19, 
P=0.0004) and B (tz) = 3.06, P= 0.006). c, Design for enhancement 
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experiment. d, There was a significant difference in freezing (one-way 
ANOVA, F,46 = 4.071, P= 0.023, n= 16, 17, 16 mice). The 5h group had 
more freezing than the home cage (HC) group (t45 = 2.72, P=0.0278) 
and 7 day group (ty = 2.612, P=0.012). There was no difference between 
home cage or 7 day groups (t4s = 0.335, not significant). Results show 
mean +s.e.m. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


* 


5h 
7d- BC: PVN 7d 


Sh- BC: 


Freezing (%) 


Explore Explore Imm Cxt 
Borc Cc Shock Test 
5h-CC: 7d-CC: 5h- BC: 7d- BC: 
Extended Data Figure 7 | Exploring the same context twice enhances The 7 day BC group also had more freezing than the 5h CC (t44= 2.35, 
memory regardless of time. a, Experimental design. b, There was a P<0.05) and 7 day CC (tas = 2.48, P< 0.05) groups, however there were 
significant difference in freezing (one-way ANOVA, F344= 2.92, P=0.04, no difference between the 5h CC and 7 day CC (t44 = 0.06, not significant) 
n= 10, 11, 13, 14 mice). Consistent with the prior experiment, there was and 5h CC and 5h BC (ty, = 0.31, not significant) groups. Results show 


more freezing in the 5h BC than the 7 day BC group (t44=2.19, P< 0.05). mean + S.e.m. 
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Extended Data Figure 8 | NMDA receptor activity is required for 
overlap of neural ensembles and behavioural enhancement. a, Design 
for Ca** imaging of neuronal overlap with MK-801 or saline. b, There 
was no difference in the number of cells active during exploration of the 
first context between saline-injected (SAL) and MK-801 groups (unpaired 
t-test, ts = 0.58, not significant, n =4, 4). c, There was lower overlap 

of the neuronal ensemble in the MK-801 group than in the SAL group 
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(paired t-test, t3 = 3.45, P=0.04, n=4 mice). d, Design for behavioural 
enhancement experiment. e, There was lower freezing in the MK-801 than 
in the SAL group (unpaired f-test, f22 = 2.65, P=0.015, n= 12, 12 mice). 

f, Design for behavioural control experiment. g, There was no difference in 
freezing between SAL and MK-801 groups (unpaired t-test, ty. = 0.22, not 
significant, n = 12, 12 mice). Results show mean +s.e.m. 
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Extended Data Figure 9 | Control experiments for aged mice. a, Design context (unpaired t-test, to = 0.24, not significant, n = 16, 15 mice). 
for experiment of recall for single contextual experience. b, There was e, Design for replication of TetTag experiment in old mice. f, There was no 
no difference in reactivation of cells between young and old mice during difference in the levels of overlapping ensembles between the 5h and 7 day 
recall (unpaired t-test, ts = 0.59, not significant, n = 4, 4 mice). c, Design groups (unpaired t-test, ts = 0.06, not significant, n = 3, 5 mice). Results 
for experiment with single context pre-exposure in young and old mice. show mean + s.e.m. 


d, There was no difference in freezing behaviour to exposures of a single 
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Extended Data Figure 10 | CNO activates cells with DREADD 
receptors and does not increase anxiety in aged mice. a, Mice 

infected with DREADD virus in CA1 were injected with saline (SAL) or 
clozapine-N-oxide (CNO) and then euthanized 90 min post-injection 

for immunofluorescence staining. b, There was no difference in the 
percentage of DREADD-positive cells (labelled with GFP) between SAL 
and CNO groups (unpaired t-test, t7 = 0.01, not significant, n =3, 6 
mice). c, DREADD-positive cells (labelled with GFP) had more ZIF when 


injected with CNO than SAL (unpaired t-test, ty = 5.08, P=002). 

d, Representative examples of ZIF, DREADD, DAPI as well as merged 
images of CA1. e, Design for elevated plus maze experiment in aged 
mice with DREADD virus. f, A two-way ANOVA showed no main effect 
of injection (F),y9=0.75, not significant, n = 6, 5 mice) and a significant 
main effect of arms (F),9= 71.03, P< 0.0001). There was no significant 
interaction between injection and arms (F;,5 = 0.003, not significant). 
Results show mean +s.e.m. 
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Pitx2 promotes heart repair by activating the 
antioxidant response after cardiac injury 


Ge Tao!, Peter C. Kahr!, Yuka Morikawa’, Min Zhang', Mahdis Rahmani?, Todd R. Heallen?, Lele Li!, Zhao Sun’, Eric N. Olson‘, 


Brad A. Amendt? & James F. Martin!2>:° 


Myocardial infarction results in compromised myocardial function 
and heart failure owing to insufficient cardiomyocyte self-renewal’. 
Unlike many vertebrates, mammalian hearts have only a transient 
neonatal renewal capacity”. Reactivating primitive reparative 
ability in the mature mammalian heart requires knowledge of 
the mechanisms that promote early heart repair. By testing an 
established Hippo-deficient heart regeneration mouse model for 
factors that promote renewal, here we show that the expression 
of Pitx2 is induced in injured, Hippo-deficient ventricles. Pitx2- 
deficient neonatal mouse hearts failed to repair after apex 
resection, whereas adult mouse cardiomyocytes with Pitx2 gain- 
of-function efficiently regenerated after myocardial infarction. 
Genomic analyses indicated that Pitx2 activated genes encoding 
electron transport chain components and reactive oxygen species 
scavengers. A subset of Pitx2 target genes was cooperatively 
regulated with the Hippo pathway effector Yap. Furthermore, 
Nrf2, a regulator of the antioxidant response’, directly regulated 
the expression and subcellular localization of Pitx2. Pitx2 mutant 
myocardium had increased levels of reactive oxygen species, while 
antioxidant supplementation suppressed the Pitx2 loss-of-function 
phenotype. These findings reveal a genetic pathway activated by 
tissue damage that is essential for cardiac repair. 

We used immunofluorescence staining to look for developmental 
transcription factors that were upregulated in regenerating Hippo- 
deficient hearts*. Paired-like homeodomain transcription factor 2 
(Pitx2) was enriched in border zone ventricular cardiomyocyte nuclei 
of adult Hippo-deficient mouse hearts after myocardial infarction 
(Fig. la-c). Pitx2 encodes three isoforms (Pitx2a, Pitx2b and Pitx2c). It 
functions in left-right asymmetric organ development? and is mutated 
in Rieger syndrome, which is characterized by craniofacial, umbil- 
ical and cardiac abnormalities®. Notably, Pitx2 deficiency results in 


Figure 1 | Pitx2 is induced in injured myocardium. a-c, Border zone 

of Salv CKO (b) and control (a) hearts stained for Pitx2 (green), cTnT 
(red), and DAPI (blue), at 10 days after myocardial infarction (DPMI, 
days post-myocardial infarction), with the Pitx2* cardiomyocyte (CM) 
ratio quantified in c;n=5 mice per group. d, Pitx2 expression shown by 
RNA-seq. AU, arbitrary units; RPKM, reads per kilobase of transcript per 
million mapped reads. e, Western blot of Flag and a-tubulin in 6 DPR 
Pitx2!"8 ventricles, resected at P1, compared to sham; n = 3 mice per 
group. f, Nrf2 protein directly binds to the Pitx2 enhancer after LAD-O. 
The heart-specific enhancers are marked by H3K27ac ChIP-seq. Red bar 
denotes the Nrf2-binding element. g, DHS-seq and chromatin state hidden 
Markov model (Chrom. state HMM) tracks of fetal and adult human 
heart tissue. Orange indicates active enhancer regions. TSS, transcription 
start sites. h, GPCR shows short interfering RNA (siRNA) knockdown 

of Nrf2 (siNrf2) in P19 cells; n =4 biological replicates. i, qPCR of Pitx2 
in P19 cells with siRNA targeting Nrf2, and in Nrf2"™" heart, compared 
to controls; n = 4 biological replicates. Data are mean + s.e.m. *P < 0.05, 
one-way analysis of variance (ANOVA) plus Bonferroni post-test (c), and 
Mann-Whitney test (h, i) (see Methods). NS, not significant. 


predisposition to the common human arrhythmia atrial fibrillation””®. 
Pitx2c is the major isoform expressed in heart. 

Available RNA-sequencing (RNA-seq) data indicated that Pitx2 
transcripts in cardiomyocytes dropped postnatally? (Fig. 1d), and west- 
ern blot analysis revealed Pitx2 protein induction after injury during 
regenerative stages (Fig. le). Consistent with reduced Pitx2 expression 
in adult hearts, active histone marks at the Pitx2 locus were reduced in 
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adult mouse hearts'® (Fig. 1f g). Available DNase I hypersensitive sites 
(DHSs) coupled with high-throughput DNA sequencing (DHS-seq) 
data revealed that Nrf2 (also known as Nfe212) binding elements were 
enriched at the Pitx2 locus (data not shown). To evaluate whether Nrf2 
activated Pitx2 after injury, we performed an Nrf2 chromatin immuno- 
precipitation followed by sequencing (ChIP-seq) experiment on hearts 
4 days after left anterior descending artery occlusion (LAD-O) in post- 
natal day 2 (P2) mice, and discovered Nrf2 binding at the Pitx2 locus 
(Fig. 1f). Nrf2 knockdown in P19 cells and Nrf2 loss-of-function in mice 
resulted in decreased Pitx2 mRNA expression, and supports the con- 
clusion that Nrf2 directly regulates Pitx2 after tissue injury (Fig. 1h, i). 
We determined whether Pitx2, similarly to Yap1, was required for 
neonatal heart regeneration’. Using Cre recombinase driven by the 
muscle creatine kinase (MCK, also known as Ckm) gene (MCK")"", 
we inactivated Pitx2 in cardiomyocytes and performed P1 apex resec- 
tion. While control hearts regenerated as expected, MCK“”;Pitx2" (Pitx2 
conditional knockout (CKO)) hearts had increased scarring and reduced 
function (Fig. 2a—e). We injured Pitx2 mutant hearts by LAD-O at P1, 
and used both MCK™ and Mhc***" to inactivate Pitx2 in myocardium. 
Pitx2 mutants failed to repair after LAD-O (Extended Data Fig. 1). 
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We examined cardiomyocyte proliferation in the P1 apex resection 
mouse model at 5 days post-resection (DPR) by pulse-labelling and 
immunofluorescence of 5-ethynyl-2/-deoxyuridine (EdU). In Pitx2/ 
controls, injury induced a threefold increase in EdU-positive cardio- 
myocytes compared to sham that was absent in Pitx2 CKO mice after 
injury, supporting the hypothesis that Pitx2, like Yap1, is essential for 
neonatal heart regeneration by promoting proliferation and injury 
resistance (Fig. 2f-h). 

To investigate whether Pitx2 is sufficient for adult cardiomyocyte 
repair, we generated Pitx2¢%, a Cre-activated Pitx2c gain-of-function 
transgenic line (Extended Data Fig. 2a). Immunoblotting and quantita- 
tive PCR (qPCR) showed increased Pitx2c levels in Mhc6™ e-Ert. ity 2G 
(Pitx2-overexpressing) ventricles (Extended Data Fig. 2b-d). LAD-O 
performed in 8-week-old mice after tamoxifen administration revealed 
that Pitx2-overexpressing hearts had reduced scar size’ (Fig. 2i, }). 
Heart morphology was comparable between controls (Myh6"?#""*) 
and Pitx2-overexpressing mice after sham surgery (Extended Data 
Fig. 2e-g). Two weeks after LAD-O, both Pitx2-overexpressing and 
controls showed decreased ejection fraction and fractional shorten- 
ing, however, Pitx2-overexpressing mice had functional recovery at 
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Figure 3 | Pitx2 interacts with Yap in 


Salv CKO 


regenerating hearts, and its nuclear 
shuttling requires Nrf2. a—d, Trichrome- 
stained control (Salv;Pitx2, n=5) (a), Salv 
CKO (n=5) (b) and double knockout (DKO, 
n=A4) (c) sections at 28 days after LAD-O 

in P8 mice, with scar size quantification 

(d). e, Echocardiography shows the ejection 
fraction (see Methods for n). f, Diagram of 
GST-Pitx2 constructs. g, GST-Pitx2 pull- 
down assay. Yap was detected by western 


805 


blotting. h, i, Immunofluorescent staining 
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cells after vehicle or 300 1m HO, treatment, 
with control siRNA or siRNA targeting Nrf2 
(siNrf2). Arrows, cytoplasmic staining; 
arrowheads, nuclear staining. The ratio of 
cells with nucleus-localized Pitx2 compared 
to total cell number is quantified in i; 

3 technical replicates per experiment, repeated 
3 times. j, Co-immunoprecipitation of Flag 
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3 and 4 weeks after LAD-O (Fig. 2k, 1). Non-regenerative stage P8 
apex resections revealed that hearts from Pitx2-overepressing mice 
had reduced scarring (Extended Data Fig. 2h-j) and improved func- 
tion at 28 DPR compared to controls (Extended Data Fig. 2k, 1). EdU 
incorporation at 8 DPR showed increased cardiomyocyte S-phase 
entry in Pitx2-overexpressing mice hearts (Extended Data Fig. 2m-o). 

Because Pitx2 was upregulated in Hippo-deficient hearts, we tested 
whether Pitx2 was required for Hippo-deficient cardiomyocyte 
renewal. Salv (also known as Sav1) CKO hearts regenerate efficiently 
after myocardial infarction*. However, Salv CKO hearts that were 
also Pitx2 mutant (double knockout) failed to regenerate (Fig. 3a-c). 
Twenty-eight days after P8 LAD-O, double knockout hearts had a larger 
scar and compromised ejection fraction? (Fig. 3d, e). Apex resection 
in non-regenerative P8 hearts also revealed the requirement for Pitx2 
function in Salv CKO cardiomyocyte renewal (Extended Data Fig. 3). 

Available genomic footprinting data from cardiac DHS data sets can 
uncover sequence-specific transcription factor-DNA interactions in 
an unbiased fashion. Motifs for Pitx2 and Tead, the Yap DNA-binding 


+ + 
CtrlsiRNA Ctrl siRNA 
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partner, were highly enriched in fetal heart footprints and often found 
in close proximity (Extended Data Fig. 4a, b). Genomic regions contain- 
ing Pitx2 or Tead motifs were enriched for histone 3 Lys4 methylation 
(H3K4mel1) chromatin marks, indicating that Pitx2- or Tead-binding 
regions were transcriptionally active. Regions containing both Pitx2 and 
Tead motifs showed globally increased transcriptional activity compared 
to regions containing only Pitx2 motifs (Extended Data Fig. 4c, d). 

The co-occurrence of transcription factor binding motifs often indi- 
cates transcription factor interactions. We tested whether Pitx2 was 
a Yap binding partner using purified glutathione S-transferase (GST) 
fusion proteins. In vitro binding assays with Pitx2 fusion peptides 
and full-length Yap revealed that Yap bound the Pitx2 homeodomain 
(Fig. 3f, g, Extended Data Fig. 5a, b). We uncovered an in vivo inter- 
action between endogenous Pitx2 and Yap using co-immunopre- 
cipitation of endogenous cardiac proteins (Extended Data Fig. 5c). 
The Pitx2!® allele, previously generated by gene targeting in mouse 
embryonic stem (ES) cells, expresses endogenous levels of Flag- 
epitope-tagged Pitx2 from the Pitx2 locus’. 
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Figure 4 | Pitx2 regulates redox 
balance in neonatal myocardium. 
a, Sankey diagram shows direct target 
genes of Pitx2 from overlaying ChIP- 
seq and RNA-seq profiles. 
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resection. The GO terms are listed 
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oxidative phosphorylation. b, Heat 
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directly targeted by Pitx2. Red bar, 
genes co-regulated by Pitx2 and 

Yap 1; orange, direct binding of Pitx2 
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Immunofluorescence analysis showed widespread distribu- 
tion of Pitx2 in P19 cells, and a cytoplasm-to-nucleus translocation 
after hydrogen peroxide (H2Oz2) treatment (Fig. 3h, Extended Data 
Fig. 6a), similar to the Nrf2 response to oxidative stress* (Extended Data 
Fig. 6b). Pitx2 nuclear translocation after HO, treatment depended 
on Nrf2 activity (Fig. 3h, i). By contrast, Nrf2 nuclear translocation 
after H2O> treatment was intact in Pitx2-null P19 cells indicating that 
Pitx2 was dispensable for the Nrf2 response to reactive oxygen species 
(ROS) (Extended Data Fig. 6b, c). We found that Pitx2 interacts with 
Nrf2 in heart extracts, expressing endogenous protein levels (Fig. 3)). 
Co-immunoprecipitation experiments using nuclear-cytoplasmic frac- 
tionation of P19 cells and analysing endogenous proteins indicated that 
Pitx2 binding to Nrf2 in the nucleus was increased after H,O2 treatment 
(Extended Data Fig. 6d, e). We also found less nuclear Pitx2 in Nrf2- 
mutant hearts after P1 apex resection (Extended Data Fig. 6f-h). 

To solidify the connection between Nrf2, Pitx2 and Yap, we tested 
whether Nrf2 was required for neonatal regeneration, as is the case for 
Pitx2 and Yap. Myocardial infarction in P2 mice revealed that Nrf2-null 
hearts were unable to regenerate, indicating that induction of the anti- 
oxidant response is required for regeneration’? (Extended Data Fig. 7). 
Notably, Pitx2-overexpressing mice that were heterozygous for 
the Nrf2-null allele (Nrf2"”’"") failed to regenerate, suggesting that 
Pitx2 promotes the antioxidant response. It is also possible that Nrf2 
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is downstream of Pitx2 in certain contexts. We also made Pitx2- 
overexpressing mice that were heterozygous for a floxed Yap] allele 
(YapL"*)'8, Reducing Yap1 dosage compromised Pitx2-overexpressing 
heart regeneration in a P8 mouse resection model (Extended Data 
Fig. 8a-e). 

To investigate Pitx2 target genes induced by injury, we collected 
P1 resected ventricles from Pitx2/ and Pitx2 CKO hearts at 5 DPR 
and performed RNA-seq (Extended Data Fig. 9a—d). We identified 
1,002 downregulated genes in Pitx2 mutants (false discovery rate 
(FDR) < 0.1, fold change > 1.5). There was extensive overlap between 
upregulated genes after apex resection in controls and downregulated 
genes in 5-DPR Pitx2 CKO hearts, indicating that in the absence of 
Pitx2, a set of stress response genes, including oxidative stress response 
genes, fails to be activated (Extended Data Fig. 9a—d). 

We examined the response of Pitx2 and antioxidant scavenger 
genes to H,O, in Pitx2-null (Pitx2"™) ES cells since Pitx2 has been 
implicated in the ROS response in skeletal muscle*'. After H.O2 
treatment, qPCR showed increased Pitx2, Gpx1, Mt1 and Mt2 expres- 
sion in wild-type ES cells, but not in Pitx2™ ES cells (Extended Data 
Fig. 9f), supporting a crucial role for Pitx2 in the response to ROS. 
While ES cells had low endogenous Pitx2 levels, the mouse P19 
embryonic carcinoma cell line expressed readily detectable Pitx2, 
primarily the Pitx2c isoform. H2O>-treated P19 cells increased the 
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expression levels of Pitx2 and its target gene in a dose-dependent man- 
ner (Extended Data Fig. 9g, h). 

To identify Pitx2 target genes, we performed P1 apex resection 
and ChIP-seq on 5-DPR Pitx2!"8 ventricles (Extended Data Fig. 9e). 
Overlay of downregulated genes from Pitx2 CKO RNA-seq with Pitx2- 
binding peaks from ChIP-seq revealed 505 direct Pitx2 targets. Gene 
Ontology (GO) analysis revealed enrichment in mitochondria, oxi- 
dation-reduction, ribosome and respiratory chain genes (Fig. 4a, b). 

Among Pitx2 targets were genes that protect the cell from increased 
ROS, such as the superoxide dismutase genes Sod1 and Sod2, which 
reduce superoxide to HO», and the glutathione peroxidase gene Gpx4, 
which removes HO, and Prdx2 (ref. 15) (Fig. 4b). Pitx2 regulates 
electron transport chain complex I components including Ndufb3, 
Ndufb4 and Ndufb7, and complex IV component Cox7c (Fig. 4b). 
Defective complex I in human patients increases ROS sensitivity’®. 
Pitx2 regulated 21.5% of its direct target genes through promoter 
binding, as revealed by enrichment of H3K4me3 chromatin marks for 
active promoters in Pitx2 peaks (Fig. 4c; 119 out of 505 direct targets). 
8-week-old mouse heart H3K4me3 chromatin marks are enriched in 
the centre of Pitx2-binding sites, supporting the hypothesis that Pitx2 
promotes transcriptional activation’”. 

To determine whether Pitx2 and Yap regulate common target genes, 
we performed Yap ChIP-seq on ventricles 5 days after LAD-O in P2 
mice, and found Yap-binding sites enriched in nearly half of the Pitx2- 
targeted gene promoters (54 out of 119; Fig. 4a, c). Comparison of Pitx2 
ChIP-seq, our Yap ChIP-seq and available Yap ChIP-seq!**! data 
revealed four redox genes bound by both Pitx2 and Yap. ChIP-re-ChIP 
assay from heart extracts revealed Pitx2 and Yap were concurrently 
resident on these genes, indicating that Yap and Pitx2 cooperatively 
activate the transcriptional response to oxidative stress (Fig. 4d). 

To investigate ROS activity in Pitx2/ and Pitx2 CKO apical border 
zones at 4 DPR, tissue sections were incubated with ROS-detecting 
reagent. Pitx2 CKO hearts had increased ROS in both cardiomyocytes 
and non-myocytes (Fig. 4e-i). To determine whether increased ROS 
contributed to scarring in 21 DPR Pitx2 CKO hearts, we administrated 
N-acetyl-L-cysteine (NAC) in Pitx2 CKO neonates after apex resec- 
tion. Daily NAC injections until 10 DPR decreased scar size in Pitx2 
CKO hearts (Fig. 4j-n). 

Increased ROS is a natural response to cardiac injury including 
ischaemic damage”*”? (Extended Data Fig. 10). During the postnatal 
transition from glycolytic to oxidative metabolism, ROS is increased 
in the heart and inhibits cardiomyocyte regeneration’. In regener- 
ative-stage hearts, Pitx2 promotes regeneration by inhibiting ROS. 
Injury induces Pitx2 expression and activity through Nrf2-activated 
Pitx2 transcription and nuclear shuttling. In turn, Pitx2 activates 
ROS scavengers, protecting cells from oxidative damage, and elec- 
tron transport chain components. It is also possible that Nrf2 also acts 
downstream of Pitx2 in some contexts. Thus, Pitx2 is essential for the 
cardiomyocyte response to injury and may prevent cell death. 

We uncovered a Pitx2-Yap interaction important for Hippo- 
deficient cardiac regeneration. Pitx2 binds Yap and cooperatively 
activates genes maintaining redox balance. Pitx2 gain-of-function, 
sufficient to bestow reparative capacity in adult cardiomyocytes, is 
repressed by Yap! heterozygosity. This suggests that Pitx2 recruits 
Yap to target genes even when Hippo is active and the pool of nuclear 
Yap is relatively low. This mechanism may work in parallel with other 
mechanisms by which Yap protects the cell from ROS”. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Mouse alleles and transgenic lines. All animal protocols and procedures 
were approved by the Institutional Animal Care and Use Committee (IACUC) 
of Baylor College of Medicine, Houston, Texas 77030, USA. All surgeries and 
echocardiographic studies were carried out blinded from genotype of the mice. 
Littermate controls were used whenever possible. Both male and female mice 
were used. The MCK“*, Myh6-cre/Esr1 (Mhc‘**") transgenic line, floxed alleles 
for ww45/salvador (Salv“/) and Pitx2 (Pitx2"/), and Flag-tagged Pitx2 allele 
(Pitx2!"8) have been described previously”. The Pitx2°F construct for over- 
expressing Pitx2 was generated by introducing a 0.9 kb Pitx2c cDNA coding 
sequence into a CMV-CAG-loxP-eGFP-Stop-loxP-IRES-3Gal expression vector”®, 
linearized construct was subjected to pronuclear injection. MCK“”;Pitx2! (Pitx2 
CKO), Mhe**";Salv'f (Salv CKO), Mhe*""';Salv!; Pitx2"! (DKO) and Mhc™ 
Ert. Ditx2°°F (Pitx2-overexpressing) mice were generated by cross breeding. After 
Cre-mediated recombination, the Pitx2 floxed allele removes all Pitx2 isoform 
function. DNA was extracted from tail biopsies for genotyping. The prim- 
ers for Pitx2°°" are: forward, 5’/-CACATGAAGCAGCACGACTT-3/; reverse, 
5!-TGCTCAGGTAGTGGTTGTCG-3’. Nrf2"™/"" is available from the Jackson 
Laboratory (strain name B6.129X1-Nfe212"™!*"*/J, stock number 017009)”. Yap"f 
strain has been described previously””. 

Cardiac apex resection. Surgical resection of the heart apex was performed on 
P1 and P8 mice as described previously*”*. For P8 surgery, tamoxifen was admin- 
istrated daily from P7-P10, by subcutaneous injection at a dosage of 40 mgkg™! 
(ref. 28). Vicryl sutures (6-0 absorbable) were used to close the thoracic cavity, 
and the entire procedure required approximately 12 minutes from the onset of 
hypothermia to recovery. Sham procedures excluded apex amputation. Mice were 
subjected to echocardiography and then euthanized at 21 DPR for P1 resection, or 
28 DPR for P8 resection. Dissected hearts were processed for histology and immu- 
nohistochemistry. Fibrotic scar size was measured using Image] 1.43u (National 
Institutes of Health) and the m number for each group is as follows: Fig. 2, 10 for 
Pitx2"l. 8 for Pitx2 CKO; Supplementary Fig. 2, 10 for Mhc*?'t, and 7 for Pitx2- 
overexpressing; Supplementary Fig. 3, 5 for Sal’! Pitx2!!. 7 for Salv CKO; 3 for 
DKO; Supplementary Fig. 8, 4 for each genotype. 

LAD-O. LAD-O in P8 mice was performed according to previous descriptions; 
tamoxifen was administrated daily from P7-P10, by subcutaneous injection at a 
dosage of 40 mgkg“! (ref. 28). Nylon sutures (8-0 non-absorbable) were used to 
occlude the LAD. Proper occlusion was noted by blanching of the myocardium 
and during dissection 3-4 weeks after occlusion via visual inspection. Vicryl 
sutures (6-0 absorbable) were used to close the thoracic cavity, and the entire 
procedure required approximately 12 minutes from the onset of hypothermia 
to recovery. Sham procedures excluded placement of a suture around the LAD. 
Mice were subjected to echocardiography, and then euthanized at 3-4 weeks 
after occlusion. Hearts were processed for histology and immunohistochemistry. 
Automated fibrotic scar size was measured using image segmentation MIQuant, 
open source code for Matlab”. The n number for each group is as follows: Fig. 3, 5 
for SalvfPitx2", 5 for Salv CKO, 4 for DKO. Alternatively, LAD-O was performed 
at P2, with minor modification from P1 apex resection and P8 LAD-O procedures 
described above, tamoxifen was administrated daily from P2-P3 when needed?”, 
The n number for each group is as follows: Supplementary Fig. 1, 8 for Pitx2//, 
7 for MCK**;Pitx2"”, 4 for Mhc!"* ="; Pitx2!, 

Adult LAD-O was performed as described for P8 with minor modifications”. 
For Pitx2-overexpressing and control (Mhc’*") mice, surgery was performed 
in 8-week-old mice, and tamoxifen was administered by intraperitoneal injection 
at three time points: 7 and 6 days before LAD-O and within 2h after LAD-O, 
at a dosage of 40 mgkg 1. Echocardiography was performed at 2, 3 and 4 weeks 
after LAD-O. The mice were then euthanized and hearts were subjected to his- 
tology. Automated fibrotic scar size was measured as described for P8 LAD-O. 
The n number for each group is as follows: Fig. 1, 5 for control and 5 for Salv 
CKO, 5 sham controls for each group; Fig. 2, 5 for Mhc*"*="' LAD-O, 8 for Pitx2- 
overexpressing LAD-O. 5 sham controls were used for each genotype. 
Echocardiography. Echocardiography was performed in the Baylor College of 
Medicine Mouse Phenotyping Core using a VisualSonics 2100 system. Evaluation 
of ejection fraction and fractional shortening of apex resection model was per- 
formed as previously described””*. The n number for each group is as follows: 
Fig. 2, Pitx2/*, 3 for sham, 6 for resection; Pitx2//, 6 for sham, 5 for resection; 
Pitx2 CKO, 5 for sham, 17 for resection; adult LAD-O, control (Mhc"'), 4 for 
sham, 5 for LAD-O; Pitx2-overexpressing, 5 for sham, 8 for LAD-O. Fig. 3, control 
(Salvf-Pitx2", 11 for sham, 12 for LAD-O; Salv CKO, 5 for sham, 4 for LAD-O; 
DKO, 6 for sham, 7 for LAD-O. Supplementary Fig. 1, Pitx2/f,7 for sham, 8 for 
LAD-O; MCK;Pitx2”, 5 for sham, 7 for LAD-O; Mhe*="*;Pitx2"f, 4 for sham, 
4 for LAD-O. Supplementary Fig. 2, control (Mhc**="), 6 for sham, 25 for resec- 
tion; Pitx2-overexpressing, 5 for sham, 16 for resection. Supplementary Fig. 7, 
control (C57BL6), 3 for sham, 5 for LAD-O; Nr fat’ m3 for sham, 5 for LAD-O. 


EdU incorporation. For P6 and P16 mice, 0.25 mg of EdU was injected subcuta- 
neously 7h before collecting the hearts. After dissection, hearts were fixed with 
10% neutral buffered formalin and processed for paraffin embedding. Seven- 
micrometre-thick tissue sections were prepared. EdU incorporation was detected 
using the Click-it EdU imaging kit (Life Technologies). Tissue slides were imaged 
with a Leica TCS SP5 confocal microscopy, and images were processed by Leica 
LAS AF software (Leica Microsystems). 

Cardiomyocyte proliferation studies. To assess cardiomyocyte proliferation rates, 
5 DPR Pitx2 CKO and control (Pitx2“) mice and 8 DPR Pitx2-overexpressing and 
control (Mhc**") mice (as described earlier) were used. EdU labelling and detec- 
tion were performed as described above. Mouse monoclonal anti-cTnT (1:200) 
(Thermo Scientific) was used to label cardiomyocytes. Images were acquired as 
described earlier. The cardiomyocyte proliferation rate was calculated by dividing 
the number of EdU-positive cardiomyocytes by the total number of cardiomy- 
ocytes in the field. Three comparable sections (every third section) from each 
heart were used. 

NAC administration. NAC (PharmaGrade, A5099 Sigma-Aldrich) was solved in 
sterile PBS at a concentration of 10mg ml". After P1 apex resection, mice were 
weighed daily, and NAC solution was injected subcutaneously from 1 to 10 DPR at 
a dosage of 75 mgkg!. Three comparable sections (every third section) from each 
heart were used, and five hearts were used in each group in Fig. 4j-n. 

Cell culture. P19 cells (ATCC CRL-1825) were cultured in aMEM medium 
(Mediatech, Corning), supplemented with 10% FBS (Gibco, Life Technologies) 
and 1% penicillin/streptomycin (HyClone Laboratories, Thermo Scientific). 0.25% 
trypsin was used for dissociating and splitting cells. H,O2 (Sigma-Aldrich) and 
doxorubicin (D-4000, LC Laboratories) were diluted in aMEM with 1% FBS and 
1% penicillin/streptomycin at final concentrations of 300 1M and 0.5 1M, respec- 
tively. After 8h of treatment, cells were collected and subjected to mRNA or pro- 
tein extraction. The ES cells used in this study have been described previously’. 
Mycoplasma detection kit (B39030, http://www.biotool.com) and MycoAlert kit 
(LT07-318, Lonza) were used, and no contamination was observed. 
Transfection of siRNA in P19 cells. Lipofectamine RNAiMAX transfection 
reagent (ThermoFisher Scientific) was used to deliver siRNA targeting Nrf2 
into P19 cells following the manufacturer’s guideline. The siRNA oligonucleo- 
tides were pre-designed DsiRNA Duplex from Integrated DNA Technologies. 
Oligonucleotide sequences: antisense, rGrArUrGrUrCrArArUrCrArArArUrCrCr 
ArUrGrUrCrCrUrGrCrUrG; sense, rGrCrArGrGrArCrArUrGrGrArUrUrUrGrAr 
UrUrGrArCrATC. 

Generation of P19 Pitx2 knockout cell line using CRISPR-Cas9 tech- 
nique. Lentivirus expressing Cas9 was used to transduce P19 cells for gener- 
ating stable cell line expressing Cas9 (J.F.M. et al., unpublished data). Guides 
targeting exons 5 and 6 of the Pitx2 locus were designed using Optimized 
CRISPR Design (http://crispr.mit.edu, F. Zhang laboratory, MIT 2015). Guides 
sequences used: upstream, 5’-CACCGAATGAGGATGTGGGCGCCG, 
3'-AAACCGGCGCCCACATCCTCATTC; downstream, 5’-CACCGTGTCCCTA 
TAAACGTACGG, 3’-AAACCCGTACGTTTATAGGGACAC. Guides were 
inserted into pSpCas9(BB)-2A-GFP (PX458) (F. Zhang, Addgene plasmid 
48138)°'. Cas9-expressing P19 cells were transfected with both guide plas- 
mid simultaneously, using Lipofectamine 2000 Transfection Reagent (Thermo 
Scientific) according to the manufacturer’s manual. GFP-positive cells were 
sorted using a BD FACSAria cell sorter. Single-cell clones were expanded and 
genotyped using the following primers: (1) 494 bp for wild type, undetecta- 
ble for knockout: forward: 5’/-GCACACACCCACACTTTCAC-3’, reverse: 
5’-CTTCCACCCACCACTCCTAC-3’; (2) 272 bp for wild type, undetectable 
for knockout: forward: 5/-GAATGGGAAAAGAGGGGAAA-3’, reverse: 5/-CC 
AGCTTCTGGACTCAGCTT-3’; and (3) 558 bp for wild type, undetectable 
for knockout: forward: 5‘-CCCCTTCTTCAACTCCATGA-3’, reverse: 5’‘-CTTGG 
GGACATTCCTTTGAA-3’. The forward primer of (1) and reverse primer of (3) 
were also combined as the fourth primer pair. The confirmed P19 Pitx2 knockout 
cell line was used in this study. 

Cytoplasmic and nuclear fraction. In brief, P19 cells were culture in 10-cm plates 
at 80% confluence. The cells were treated with vehicle (water) or H2O (300\1M) 
for 8h before being collected for cell fraction assay. The NE-PER Nuclear and 
Cytoplasmic Extraction Reagents (Thermo Scientific) were used according to the 
manufacturer’s manual. 

Histology and immunofluorescence. Trichrome staining was performed as pre- 
viously described”*. Fixation, tissue processing, antigen retrieval and blocking for 
nonspecific staining have been described previously*’. Samples were incubated 
in primary antibody at 4°C overnight. After washing in PBS with Tween 20, sec- 
tions were incubated in the appropriate fluorescent-labelled secondary antibodies, 
followed by counterstaining with DAPI (10j:gml~') (Roche), and then mounted 
in VECTASHIELD hardset mounting medium (Vector Laboratories). P19 cells 
were fixed in formalin (VWR International) for 10 min, then permeabilized in 
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0.2% Triton X-100 (Bio-Rad Laboratories) in PBS. After blocking in 10% sheep 
serum (Sigma-Aldrich) for 30 min, cells were incubated with primary antibody 
for 2h at room temperature, followed by a 1-h incubation in proper fluorescent- 
labelled secondary antibodies. Cells were counterstained with DAPI (Roche) then 
mounted in VECTASHIELD hardest mounting medium (Vector Laboratories). 
Primary antibodies used were as follows: mouse monoclonal anti-cTroponin T 
(1:200; Thermo Scientific), rabbit polyclonal anti-Pitx2 (1:400; Capra Science) and 
rabbit polyclonal anti-Nrf2 (1:200; Abcam). Secondary antibodies used were as 
follows: Alexa Fluor 488 goat anti-rabbit IgG and Alexa Fluor 546 donkey anti- 
mouse IgG (1:400-1:800; Life Technologies); biotinylated anti-mouse IgG (1:200; 
Vector Laboratories); streptavadin-Alexa Fluor 647 (1:200; Life Technologies). 
Immunofluorescent images were captured on (1) a Leica TCS SP5 confocal micro- 
scope (all functions controlled via Leica LAS AF software (Leica Microsystems)); 
(2) a Zeiss LSM 510 META laser scanning confocal microscope (all functions 
controlled via Zeiss LSM Image Browser software (Carl Zeiss Microimaging)); 
or (3) a Nikon Eclipse 80i upright microscope (all functions controlled by the NIS- 
Elements BR3.1 software program (Nikon Instruments)). All manuscript figures 
were prepared using Adobe Photoshop CS5 (Adobe Systems Inc.). 

ROS detection. Pitx2 CKO and control (Pitx2//) 4 DPR hearts were cryo- 
embedded, and 101m tissue sections were prepared. CellROX green reagent (Life 
Technologies) was used to detect the presence of ROS according the manufacturer's 
manual with minor modifications. Tissue slides were warmed to room temperature 
(25°C), and rinsed with PBS three times. CellROX substrate was added and incu- 
bated for 10 min at 37 °C. Slides were given three 5-min washes with PBS, and 10% 
neutral buffered formalin was added. After a 15 min fixation, mouse-anti- MF20 
IgG (1:50; Developmental Studies Hybridoma Bank) and Alexa Fluor 546 donkey 
anti-mouse IgG (1:400; Life Technologies) were used to counterstain cardiomyo- 
cytes. Nuclei were highlighted by DAPI. Slides were mounted in VECTASHIELD 
hardest mounting medium (Vector Laboratories), and imaged using a Nikon 
Eclipse 80i upright microscope (Nikon Instruments) within 2h of completion. 
Co-immunoprecipitation. Pitx2!"s mice were subjected to P2 LAD-O, and 
ventricular tissue was collected at 4 DPMI. P19 cell fractions were obtained as 
described above. For Flag pull-down, anti-Flag M2 affinity gel (Sigma-Aldrich) 
was used. For Nrf2 pull-down, rabbit polyclonal anti-Nrf2 (Abcam) and protein 
A/G PLUS-Agarose (Santa Cruz Biotechnology) were used according to the 
manufacturer’s manual. 

Western blot. Ventricles of 6 DPR (n=3) and sham (n= 3) Pitx2!"8 mice, P16 
Mhce“®*" (control, n= 3) and Pitx2-overexpressing (n= 3), as well as P19 cells 
were collected and lysed in RIPA buffer, and the protein concentration was 
quantified using Pierce BCA protein assay kit (Pierce Biotechnology) as previ- 
ously described?’, the P19 cell fraction were acquired as described earlier, with 
three biological replicates. Co-immunoprecipitation samples were acquired as 
described above. In brief, after separation via SDS-PAGE, proteins were trans- 
ferred to PVDF membranes (EMD Millipore), blocked in 5% milk/TBS-Tween 
20 and incubated with appropriate primary antibodies (all with 1:1,000 dilution 
in TBST) overnight at 4°C (rabbit anti-Flag IgG and mouse anti-c-tubulin IgG, 
Sigma-Aldrich; rabbit anti-Pitx2 IgG, Capra Science; rabbit anti-Yap, Novus 
Biologicals; rabbit anti- TATA-binding protein, Cell Signaling Technology; rabbit 
polyclonal anti-Nrf2, Abcam). Membranes were then washed three times in TBST 
and incubated with goat-anti-rabbit and goat-anti-mouse horseradish peroxidase 
(HRP)-conjugated secondary antibodies (1:5,000; Santa Cruz Biotechnology) for 
1hat room temperature. Protein detection was performed using SuperSignal West 
Pico Chemiluminescent Substrate (Thermo Scientific). For quantification, Pitx2 
and «-tubulin band densities for control (Mhc"*#") and Pitx2-overexpressing mice 
were determined using ImageJ software (National Institutes of Health). 

ChIP. P1 apex resection or sham surgery was performed on Pitx2!"8 neonates; 
P2 LAD-O was performed on C57BL6 neonates. At 5 days after surgery, whole 
ventricles were collected and subjected to ChIP assay using the EZ-ChIP kit 
(Millipore) according to the manufacturer’s protocol. For Flag ChIP, anti-Flag M2 
affinity gel (Sigma-Aldrich) was used; for Nrf2 ChIP, rabbit polyclonal anti-Nrf2 
antibody (Abcam) was used; for Yap ChIP, rabbit polyclonal anti- Yap antibody 
(Novus Biologicals) was used. 

ChIP-re-ChIP. P2 LAD-O was performed on Pitx2!8 neonates, and 4 days 
later whole ventricles were collected and subjected to the ChIP-re-ChIP assay 
as previously described’. Monoclonal anti-Flag BioM2 antibody (Sigma- 
Aldrich) was used for pulling down Flag; rabbit anti- YAP (Novus Biologicals) 
was used for pulling down Yap. For qPCR analysis of the peak and non-peak 
regions, the following primers were used. For peak region: Oxnad1, forward: 
5'-GGGTTTTTAGTGGGCAACCTAT-3’, reverse: 5‘-CTGGGCTTTAGA 
GACAGCTAGG-3’; Ldha, forward: 5’-CCAGAGATCTTGTCCAGTCCTT-3’, 
reverse: 5'-TTCAGTTCCAAAATGGGGATAC-3’; Ndufb3, forward: 5'-GTACCGG 
AACGTTTACCATCTC-3’, reverse: 5’/-CACCAGCTCCCTAAATTACCTG-3’; 
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Ndufs8, forward: 5'-CATAGTGCCCTTTCCTTCTTTG-3’, reverse: 
5'-GGCGCATTAACCTCTCTGATAC-3’, 

For non-peak-region: Oxnad1, forward: 5’-TAATGAGATCTGGTGCCCT 
CTT-3’, reverse: 5‘-GACACACCCCATCTGTACTTCA-3’; Ldha, forward: 
5'-GTATCTTCACAGGCCTTTCCTG-3’, reverse: 5’-GGCTGTGGGACAATGTT 
CTTAT-3’; Ndufb3, forward: 5'-TGCTACTCTTCCAGAGGACCTT-3’, reverse: 
5'-GGTATGTGTGTGTGTGATGTGC-3'; Ndufs8, forward: 5/-TCTACTGCCTTT 
ATGGCGTTTT-3’, reverse: 5’-CTCATGGGCTGTGACAATAGAA-3’. 
qPCR. For Fig. 1h, i, mRNA was prepared from P19 cells, and control (C57BL6) 
and Nrf2"™"" ventricles at P1. For Fig. 4d, DNA was prepared as described in 
the ‘ChIP-re-ChIP’ section. For Supplementary Fig. 9e, DNA was prepared 
according to the ChIP protocol as mentioned above. For Supplementary 
Fig. 2b, mRNA of Mhc‘'**" (control) and Pitx2-overexpressing ventricles was 
prepared at P16, 8 days after the first tamoxifen injection (daily from P7-P10). 
For Supplementary Fig. 9f, g, mRNA was prepared from P19 cells and ES cells. 
Total mRNA was extracted using miRNeasy Mini Kit (Qiagen); cDNA was 
generated using qScript cDNA supermix (Quanta BioSciences); qPCR was 
performed on a StepOnePlue Real-Time PCR system (Applied Biosystems) 
with iTaq Universal SYBR Green Supermix (Bio-Rad Laboratories), all accord- 
ing to the manufacturers’ manuals. The primers used for ChIP-qPCR are as 
followed: Gpx1 region1, forward: 5'/-GCTTCATCCCTCCTAATGGA-3’, reverse: 
5!-TGCCAGCATTAACTCAGAGC-3’; Gpx1 region2, forward: 5’-TCTTCCTAGG 
CGGGACTCTA-3’, reverse: 5’-GGGTCTGGTCTAGCTCCTGT-3’; Mtl, 
forward: 5'-TTCTGCAGTCCAGTCTGACC-3’, reverse: 5/-ATAGGAGATGGCC 
TGGTGAC-3’; M?2 region1, forward: 5'-GCCCTCCCACCTACTCATTA-3’, 
reverse: 5‘-GGTGACTGTCATCCCACTTG-3’; Mi2 region2, forward: 5/-TTCACT 
AAGAGCTGCGAGGA-3’, reverse: 5’-ATCTGCAGAGCCAGGAAACT-3’; Sod2 
region], forward: 5/-ACGTGGCTTCAGGAGAGTTT-3’, reverse: 5/-CAATAT 
CGCTTGCTCTCAGC-3’; Sod2 region2, forward: 5’-CTCTCATGCATGCA 
AATCCT-3’, reverse: 5‘’-CAGCTCTAAGGGACCCAGAC-3’; Sod1 region1, 
forward: 5‘-ACTGTGACCCTGCAAAAACA-3’, reverse: 5’-GTCCACCACTTC 
AGAGAGCA-3’; Txn2 region1, forward: 5/-CCACACAGCTGAAGGAGAGA-3’, 
reverse: 5’/-GGAGTGCTGGGAATGTAGGT-3’; Txn2 region2, forward: 5/- 
CCAAAATACCCAAGCCTGTT-3’, reverse: 5’-TTTCCACATGCCTCTGTC 
TC-3'; Ndufv1 region1, forward: 5’-GGCTGCGAGGAAGAAATAAC-3’, 
reverse: 5'/-ACTAACGGTCCCAACTCCAG-3’; Ndufv1 region2, forward: 
5/-ACAAGATGCAGGTCATGGAA-3’, reverse: 5‘-ATCAGAGCCACACTG 
TCTGC-3’; CoxSa, forward: 5’-GCTGTTCTGGGATTGGATCT-3’, reverse: 
5'’-AGAGCCTGTCTCTCCCAAAA-3’, 

The primers used for other qPCR are as followed: Gpx1, forward: 5’- 

GTCCACCGTGTATGCCTTCT-3’, reverse: 5’/-CTCCTGGTGTCCGAACTG 
AT-3/; Mt2, forward: 5'-CCGATCTCTCGTCGATCTTC-3’, reverse: 
5'’-AGGAGCAGCAGCTTTTCTTG-3’; Mtl, forward: 5'-GCTGTCCTCTAAGC 
GTCACC-3’, reverse: 5/-AGGAGCAGCAGCTCTTCTTG-3/; Sod2, forward: 
5'-GGCCAAGGGAGATGTTACAA-3’, reverse: 5’-GCTTGATAGCCTCCA 
GCAAC-3’; Txn2, forward: 5’-CCCCTCAGTACAATGCTGGT-3’, reverse: 
5'-TCCATCCTGGACGT TAAAGG- 3’; Pitx2, forward: 5‘-AGGGAGGGAGG 
CAAGAAAAG-3’, reverse: 5‘-CTTGAAAGAGCCAGGGAACG-3’; Nrf2, 
forward: 5‘-CCAGAAGCCACACTGACAGA-3’, reverse: 5’-GGAGAGGATG 
CTGCTGAAAG-3’, 
RNA-seq and ChIP-seq. Total mRNA was extracted from the ventricles of Pitx2 
CKO and Pitx2/ mice 5 days after P1 apex resection, using the miRNeasy Mini 
Kit (Qiagen); ChIP DNA was acquired as described above. RNA-seq and ChIP- 
seq were performed using the Ion Proton system for next-generation sequenc- 
ing according to the manufacturer’s direction. Sequenced reads were mapped 
to mm9 genome using Ion Torrent TMAP aligner with ‘map4’ option. We used 
HTSeq-Count (version 0.5.4) to quantify the aligned RNA-seq reads against exon 
regions of genes in RefSeq mm9 annotation. Differential expressed genes were 
detected using R package DESeq with threshold P < 0.05, fold change > 1.5 and 
FDR < 10%. ChIP-seq peaks were detected by Homer package ‘findPeak’ com- 
mand using threshold FDR < 10%. Only peaks detected from both biological 
replicates were annotated and overlaid with differential gene expression list. GO 
analysis was performed on DAVID online platform. Terms with P < 0.05 were 
included. Published data sets used in Fig. 1f were obtained from Gene Expression 
Omnibus (GEO) (series GSE52386, reviewed in ref. 32). Mapped human DHS- 
seq data were extracted from the GEO (GSE18927, GSE32970). In total, 1,002 
genes were downregulated in Pitx2 CKO 5 DPMI and were overlaid with Pitx2 
ChIP-seq binding genes, which were further overlaid with Yap ChIP-seq from 
control 5 DPMI. We define genes that were co-regulated by both Pitx2 and Yap as 
expression level decreased in Pitx2 CKO heart and bound by both factors on the 
promoter regions. Overrepresented GO analysis was performed using online tool 
DAVID 6.7 (https://david.ncifcrf.gov/). 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


GST pull-down assay. The mouse Yap, Pitx2a, Pitx2c and truncated proteins were 
prepared as previously described*’. In brief, GST-tagged proteins were expressed in 
and purified from BL21 competent Escherichia coli (New England Biolabs), using 
Glutathione Sepharose 4B (GE Healthcare). YAP was cleaved from GST using 
PreScission Protease (GE Healthcare), and 1 jg was incubated with 151g of each 
truncated GST-Pitx2 protein for the pull-down assay. Incubation of corresponding 
truncated Pitx2 protein alone was used as controls. Purified YAP protein (21g) 
was loaded into the gel as a control. Rabbit-anti- YAP antibody (Cell Signaling 
Technology) was used for immunoblotting and the detection of YAP. 

Coomassie blue staining. GST-Pitx2, GST- Yap and cleaved Yap protein were run 
on 10% SDS-PAGE gel. The gel was then stained for protein in Coomassie blue 
stain (2.5 g1-! in H,O:methanol:glacial acetic acid at a ratio of 9:9:2) for 1h with 
shaking, followed by destaining with Coomassie solvent (H,O:methanol:glacial 
acetic acid = 9:9:2) for 2h with shaking. The stained gel was scanned with EPSON 
Perfection 4490 Photo (Epson America). 

Statistics. Each experimental group in the ChIP-seq and RNA-seq studies had 
n=2. All quantitative experiments (for example, qPCR, western blot, cell count) 
have at least three independent biological repeats. For animal studies (neonatal 
and adult surgery), sample sizes were estimated based on our pilot studies. The 
n number for each experiment is summarized in relevant sections in the Methods. 
Differences between groups were examined for statistical significance using 
non-parametric test (Mann-Whitney test) (for two groups), or one-way ANOVA 
plus Bonferroni post-test (for more than two groups). Equal variances were assumed 
(no Welch's correction). Grubbs’s test was used to determine outlier (GraphPad 
Prism, GraphPad Software). In the case of Fig. 4d, Wilcoxon signed-rank test (with 


a hypothetical value of 0) was used to compare anti-Yap plus anti-Flag group to 
anti-IgG group, as well as anti-Yap—anti-IgG group, since the latter two groups were 
undetected in qPCR assay, the same test (Wilcoxon signed-rank) was also applied 
to Fig. 1h, i (left panel) (with a hypothetical value of 1). All bar graphs represent 
mean+s.e.m. *P<0.05, ***P< 0.001 were considered statistically significant. 
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Extended Data Figure 1 | Pitx2 is required in neonatal myocardial after LAD-O; n=8 for control (Pitx2”), n=7 for MCK;Pitx2/, and n=4 
regeneration after LAD-O. a, Serial trichrome images of control (Pitx2/), for Mhc*t®="';Pitx2//, c, d, Ejection fraction (c) and fractional shortening 
MCK™;Pitx2!, and Mhc"**";Pitx2/21 days after LAD-O performed in (d) of LAD-O and sham hearts (see Methods for n). L, left ventricle; 
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b, Percentage of fibrotic left ventricular myocardium quantified at 3 weeks Bonferroni post-test (c, d) and Mann-Whitney test (b). 
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Extended Data Figure 2 | Pitx2 promotes myocardial regeneration 9 (Pitx2-overexpressing LAD-O). h-j, Apex resection of Pitx2-overexpressing 
after apex resection at P8. a, Schematic of Pitx2-expressing construct (i) and control (Mhc“*"+) (h) hearts at P8 followed by trichrome staining 
(Pitx2°%/), b-d, Pitx2°% was crossed with the Mhc"*=" strain to generate at 28 DPR; the scar area was quantified in j; m= 10 (control mice), 7 (Pitx2- 
Mhe** "+ Pitx2°% (Pitx2-overexpressing) mice. After tamoxifen treatment overexpressing mice). k, 1, Echocardiography showed ejection fraction (k) 
from P7—P10, qPCR (b, n=4) and western blot (c, d, n= 3) show the and fractional shortening (1) at 28 DPR (see Methods for n). m-o, EdU 
overexpression of Pitx2 in the myocardium at P16. e, f, Trichrome-stained labelling of Pitx2-overexpressing (n) and control (m) apical area, 8 days after 
cross sections from 13-week-old sham hearts of control (e) and Pitx2- P8 resection, sections were stained for cTnT (green), EdU (yellow), and DAPI 
overexpressing (f) mice, with tamoxifen administrated at 7-8 weeks old. (blue). Arrow indicates EdU-labelled cardiomyocytes, with quantification 

g, Heart weight over body weight ratio of adult sham and LAD-O hearts; in 0; n=4 mice per group. Mean +s.e.m. *P< 0.05, one-way ANOVA plus 
n=4 (control sham), 4 (control LAD-O), 5 (Pitx2-overexpressing sham), Bonferroni post-test (g, k, 1) and Mann-Whitney test (b, d, j, 0). 
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Extended Data Figure 3 | Pitx2 is required for Hippo-deficient heart 
regeneration. a, Schematic study plan for Fig. 3a—e. b-e, Trichrome- 
stained apical areas of control (b), Salv CKO (c) and double knockout 
(d) hearts 21 days after P8 apex resection. Scar area was quantified in 
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e. f, Heart weight to body weight ratio of sham hearts at 28 days after 
tamoxifen administration. For n number, see Methods. Mean +s.e.m. 
*P <0.05, Mann-Whitney. 
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Extended Data Figure 4 | Co-occurrence of Pitx2 and Tead DNA- input read density in 6-kb regions of DHS peaks. DHS peaks were centred 
binding motifs in fetal heart enhancers. a, Consensus Pitx2 and Tead on the Pitx2 motif, Tead motif, Pitx2-Tead motifs, or randomly selected. 
motifs. b, Pitx2 and Tead motif co-occurrence in fetal heart DHS peaks. The read density was in log: scale. Blue, negative values; yellow, positive 
c, Aggregate plot of H3K4mel in fetal heart ChIP-seq reads within 6 kb values. 


range of DHS peaks. d, Heat map of fetal heart H3K4me1 ChIP-seq or 
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Extended Data Figure 5 | Generation of GST-tagged proteins and correct size (marked by asterisk). b, Coomassie blue staining of the 
interaction between Pitx2 and Yap in vivo. a, The mouse Pitx2a, Pitx2c purified GST-Yap, Yap cut by prescission protease and pure Yap protein. 
and truncated proteins were purified and run on a 10% SDS-PAGE gel, c, Co-immunoprecipitation of Flag in Pitx2!8 ventricles at 5 DPR, and 
and Coomassie blue staining shows the GST fusion protein band with blotting of Yap, Pitx2 and Flag. 
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Extended Data Figure 6 | Nuclei-shuttling of Nrf2 is independent 

of Pitx2. a, Western blotting of Pitx2, a-tubulin, and TATA-binding 
protein (TBP) of P19 cell fraction after H2O2, with or without Nrf2 
siRNA treatment. b, Immunofluorescent staining of Nrf2 (green) in P19 
control and Pitx2 knockout cells after vehicle or HO2 treatment. DAPI, 
blue. Scale bars, 501m. c, The ratio of cells with nuclear Nrf2 over total 
cell number; n = 6 biological repeats. d, Blotting of a-tubulin and TBP 
to show cell fraction of P19 cells used in e. e, Co-immunoprecipitation 
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Nrf2 from nuclear and cytoplasmic fraction of P19 cells after vehicle or 
H,O) treatment, blotting shows Nrf2 and Pitx2. f-h, 4 DPMI control 
(C57BL6) (f) and Nrf2""" (g) cross-sections stained for Pitx2 (red), 
cTnT (green), and DAPI (blue), with the ratio of cardiomyocytes with 
nuclei-localized Pitx2 quantified in h; n = 4 mice per group. Arrows, 
Pitx2* cardiomyocytes. Mean + s.e.m. *P < 0.05, one-way ANOVA plus 
Bonferroni post-test (c) and Mann-Whitney test (h). 
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Extended Data Figure 7 | Nrf2 is required for neonatal myocardial regeneration. a, Trichrome images of Nrf2" and control heart (C57BL6) at 
21 days after P2 LAD-O, along with sham controls. b, c, Ejection fraction (b) and fractional shortening (c) of LAD-O and sham hearts (see Methods for 1). 
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Extended Data Figure 8 | Yap1 and Nrf2 are essential for Pitx2-induced myocardial regeneration. a—d, Trichrome staining showing apical scarring of 
different groups at 28 DPR, apex resection was performed at P8. e, Quantification of scar area; n = 4 mice per group. Mean +s.e.m. *P < 0.05, 
Mann-Whitney test, control compared to the other three groups individually. 
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Extended Data Figure 9 | Pitx2 regulates antioxidant scavenger genes. 
a, Overall change of genes in Pitx2 CKO mice compared to control. 

b, Upregulated genes in 5 DPR control over wild-type sham heart 

(n= 480) overlaid with downregulated genes in 5 DPR Pitx2 CKO over 5 
DPR control heart (n = 1,002). c, GO analysis of genes upregulated (left) 
and downregulated (right) in Pitx2 CKO ventricles over controls at 5 DPR. 
d, GO analysis of genes upregulated (right) and downregulated (left) in 

5 DPR control ventricles over age matching sham hearts. e, ChIP-qPCR 
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confirming the binding of Pitx2 to the regulatory regions of target genes; 
n=4 biological replicates. f, qPCR detecting Pitx2 and antioxidant genes 
in wild-type and Pitx2"’/" ES cells after vehicle or HO treatment; n =4 
biological replicates. g, qPCR of antioxidant genes in P19 cells after 
doxorubicin or HO; treatment; n =5 biological replicates. h, qPCR of 
Pitx2 in P19 cells after doxorubicin or H2O> treatment; n =5 biological 
replicates. Mean + s.e.m. *P < 0.05; ***P < 0.001, Mann-Whitney test. 
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Extended Data Figure 10 | Mechanism model of Pitx2, Nrf2 and Yap1 
responding to oxidative stress. When oxidative stress is low, Nrf2 is 
sequestered in cytoplasm by its degradation complex (Cul3, Keap1), and 
Pitx2 stays either in the cytoplasm or at low expression levels. When the 
redox balance is disturbed by ROS, Nrf2 breaks away from the degradation 
complex, and enters nuclei to upregulate Pitx2 gene expression; Nrf2 also 
binds cytoplasmic Pitx2 and shuttles it to the nuclei, where Pitx2 and Yap 
co-regulate their common targets including critical antioxidant genes. 


In wild-type adult mouse heart, active Yap is maintained at a low level, 
even after ischaemic injury, and is thus not able to repair myocardium 
efficiently. When Pitx2 is overexpressed in cardiomyocytes, sufficient 
amounts of Pitx2 will cooperate with low levels of resident active Yap to 
induce the expression of beneficial antioxidant scavengers in a synergetic 
pattern, rendering protection to injured myocardium. Red arrow, 
supported by in vitro evidence; Blue arrows, supported by in vivo evidence. 
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Feedback modulation of cholesterol metabolism by 
the lipid-responsive non-coding RNA LeXis 


Tamer Sallam!*, Marius C. Jones’, Thomas Gilliland’, Li Zhang', Xiaohui Wu'”, Ascia Eskin’, Jaspreet Sandhu', David Casero’, 
Thomas Q. de Aguiar Vallim’, Cynthia Hong!, Melanie Katz*, Richard Lee‘, Julian Whitelegge® & Peter Tontonoz! 


Liver X receptors (LXRs) are transcriptional regulators of cellular 
and systemic cholesterol homeostasis. Under conditions of excess 
cholesterol, LXR activation induces the expression of several genes 
involved in cholesterol efflux’, facilitates cholesterol esterification by 
promoting fatty acid synthesis’, and inhibits cholesterol uptake by 
the low-density lipoprotein receptor’. The fact that sterol content is 
maintained in a narrow range in most cell types and in the organism 
as a whole suggests that extensive crosstalk between regulatory 
pathways must exist. However, the molecular mechanisms that 
integrate LXRs with other lipid metabolic pathways are incompletely 
understood. Here we show that ligand activation of LXRs in mouse 
liver not only promotes cholesterol efflux, but also simultaneously 
inhibits cholesterol biosynthesis. We further identify the long 
non-coding RNA LeXis as a mediator of this effect. Hepatic LeXis 
expression is robustly induced in response to a Western diet (high in 
fat and cholesterol) or to pharmacological LXR activation. Raising 
or lowering LeXis levels in the liver affects the expression of genes 
involved in cholesterol biosynthesis and alters the cholesterol levels 
in the liver and plasma. LeXis interacts with and affects the DNA 
interactions of RALY, a heterogeneous ribonucleoprotein that acts 
as a transcriptional cofactor for cholesterol biosynthetic genes in 
the mouse liver. These findings outline a regulatory role for a non- 
coding RNA in lipid metabolism and advance our understanding of 
the mechanisms that coordinate sterol homeostasis. 

It is well established that the cholesterol biosynthetic pathway is 
downregulated under conditions in which sterols are abundant through 
the inhibition of sterol regulatory element-binding protein (SREBP) 
processing’. Notably, however, under conditions in which hepatic cho- 
lesterol content was not enriched, activation of LXRs with the selec- 
tive synthetic agonist GW3965 also acutely suppressed the expression 
of sterol synthesis genes in mouse liver (Fig. 1a and Extended Data 
Fig. la). The effect could not be explained by changes in intracellular 
cholesterol levels, as LXR activation has been shown to lower hepatic 
cholesterol content*, which would lead to upregulation of the SREBP-2 
pathway. 

To investigate the mechanism by which LXRs suppress cholesterol 
biosynthesis, we performed genome-wide transcriptional profil- 
ing on primary mouse hepatocytes treated with vehicle or GW3965 
(Extended Data Fig. 1b). The most robustly induced gene in our RNA- 
sequencing (RNA-seq) analysis was a predicted non-coding RNA anno- 
tated as 4930412L05Rik (Extended Data Fig. 1c). Parallel profiling of 
non-coding and protein-coding transcripts using microarrays also 
identified 4930412L05Rik as the highest induced transcript (Extended 
Data Fig. 1d). We named this transcript LeXis (liver-expressed LXR- 
induced sequence). Notably, the LeXis gene locus lies in close prox- 
imity to the canonical LXR target gene Abcal in mouse. Analysis 
of chromatin structure from The ENCODE Project®” indicated 
that LeXis and Abcal were distinct genes with separate promoters 


(Fig. 1b). We defined the transcripts produced from the LeXis gene 
using rapid amplification of complementary DNA ends (RACE) 
(Extended Data Fig. 2). LeXis and Abca1 were induced by LXR and 
retinoid X receptor (RXR) agonists (LG268 and GW3965, respectively) 
in primary hepatocytes in an LXR-dependent manner (Fig. 1c and 
Extended Data Fig. 3a). LeXis was induced in LXRa~/~ and LXR3/~ 
(also known as Nr1h3~/~ and Nr1h2~‘-, respectively) hepatocytes, 
indicating that both LXR isotypes are capable of regulating LeXis 
(Extended Data Fig. 3b). Induction of LeXis was not sensitive to the 
protein synthesis inhibitor cycloheximide, and was not dependent on 
SREBPs, since 25-hydroxycholesterol (which blocks SREBP processing) 
also induced LeXis (Extended Data Fig. 3c, d). 

Administration of GW3965 to mice induced the expression of LeXis 
in several metabolically active tissues (Fig. 1d and Extended Data 
Fig. 3e). We also observed a prominent, LXR-dependent induction 
of LeXis expression in response to Western diet feeding, consistent 
with a potential role for LeXis in the response to cholesterol excess 
(Fig. le). Despite being physically adjacent, the LeXis and Abca1 loci are 
regulated independently. LeXis was neither expressed at baseline nor 
induced by LXR in mouse peritoneal macrophages, a cell type in which 
Abcal expression is prominent (Fig. 1f). A luciferase reporter contain- 
ing the LeXis promoter was induced by LXR and RXR in co-transfection 
assays (Extended Data Fig. 3f), and we identified an LXR-response 
element within the LeXis promoter region that was bound by LXRa in 
chromatin immunoprecipitation and quantitative PCR (ChIP-qPCR) 
assays (Extended Data Fig. 3g). The coding potential calculator and 
coding-non-coding index algorithms predict low coding potential of 
LeXis (Extended Data Fig. 3h, i). In addition, we found no evidence of 
production of a protein product from LeXis using in vitro transcrip- 
tion-translation assays (Extended Data Fig. 3)). 

To explore the function of LeXis in vivo, we transduced mice with 
adenoviral vectors encoding green fluorescent protein (GFP) con- 
trol or LeXis (Fig. 2a and Extended Data Fig. 4a). Remarkably, LeXis 
expression decreased serum cholesterol, but not triglycerides, in chow- 
fed C57BL/6 mice (Fig. 2a, b). No differences in liver function tests 
were observed between the two groups, and there was no evidence 
of ER stress or inflammation (Fig. 2b and Extended Data Fig. 4b, c). 
Fractionation of lipoproteins revealed reduced cholesterol in both the 
low-density lipoprotein (LDL) and high-density lipoprotein (HDL) 
fractions in LeXis-expressing mice (Fig. 2c). The effects of LeXis were 
distinct from the consequences of hepatic expression of other LXR 
target genes, such as Abcal and Idol (also known as Mylip), which raise 
serum cholesterol®’. 

Unbiased pathway analysis of global gene expression revealed that 
the cholesterol biosynthetic pathway was strongly downregulated in 
LeXis-transduced livers (Extended Data Fig. 4d). These results were 
validated by qPCR (Fig. 2d). These results suggested that the choles- 
terol lowering effects of LeXis were due, at least in part, to suppression 
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Figure 1 | LXR activation inhibits cholesterol biosynthesis and induces 
LeXis expression. a, qPCR analysis of gene expression in livers from 
C57BL/6 mice treated by oral gavage with 40 mg kg! GW3965 for 

the indicated time (n =6 mice per group). All curves are statistically 
different from baseline expression (P < 0.05, one way analysis of variance 
(ANOVA)). b, Schematic representation of the LeXis gene locus on an 
Integrative Genome Viewer (IGV) showing histone marks from LICR 
ENCODE data. c, qPCR analysis of gene expression in primary mouse 
hepatocytes treated with GW3965 (GW; 1 1M) and/or the RXR ligand 
LG268 (LG; 50 nM). DKO, double knockout (LXRa~/~ and LXRG"'~). 


of cholesterol biosynthesis. Consistent with this interpretation, 
we observed a strong trend towards lower cholesterol content in the 
livers of mice overexpressing LeXis (Extended Data Fig. 4e). For reasons 
that are not yet clear, treatment of isolated primary hepatocytes did not 
reflect the effects of either LXR agonist treatment or LeXis expression 
on genes linked to sterol synthesis (Extended Data Fig. 4f, g). 

A reduction in plasma cholesterol suggests an increase in lipoprotein 
clearance or a decrease in sterol production!”. To assess the contribu- 
tion of the low-density lipoprotein receptor (LDLR) to the actions of 
LeXis, we transduced Ldlr~’~ mice with control or LeXis-expressing 
adenovirus. We observed decreases in plasma cholesterol levels and 
hepatic cholesterol content in response to LeXis in Ldlr~/~ mice, 
suggesting that the LDLR is not required for LeXis effects (Fig. 2e 
and Extended Data Fig. 4h, i). To assess the contribution of SREBP-2 
signalling to LXR-mediated inhibition of cholesterologenesis, we 
administered GW3965 to control or liver-specific SCAP (L-SCAP) 
knockout mice!!. Consistent with previous studies!*, GW3965 treat- 
ment did not alter serum cholesterol levels in control mice (Fig. 2f). 
Notably, however, GW3965 increased serum cholesterol levels in 
L-SCAP knockout mice, suggesting the loss of a suppressive effect 
(Fig. 2f, g). LXR target genes, including LeXis itself, were induced by 
GW3965 in both groups; however, the suppression of steroidogenic 
genes was abrogated in L-SCAP knockout mice (Extended Data Fig. 4)). 
Furthermore, expression of LeXis also failed to lower serum cholesterol 
or suppress cholesterogenic gene expression in L-SCAP knockout mice 
(Fig. 2h and Extended Data Fig. 4k). 

To address the role of LeXis in the setting of dietary cholesterol 
challenge, we used adenoviral vectors to express short hairpin RNA 


Results are representative of four independent experiments. d, qPCR 
analysis of gene expression in livers from male C57BL/6 mice gavaged 
with 40 mg kg~! GW3965 before collection at the indicated time (n= 6 
per group). e, Gene expression in livers obtained from mice maintained 
on chow (n= 2 per group) or a Western diet (n= 5 per group). f, Gene 
expression in primary mouse peritoneal macrophages treated with 1 1M 
GW3965 and/or 50nM LG268 for 16 h. Results are representative of four 
independent experiments. Values are mean + s.d. (c, f), or mean +s.e.m. 
(a, d, e). *P< 0.05; **P < 0.01 (analysis of variance (ANOVA) with multi- 
group comparison in a, d and e). 


(shRNA) constructs targeting LeXis in mouse liver!*'4, Knockdown of 
LeXis with either of two different shRNA constructs increased serum 
HDL cholesterol levels in mice fed a Western diet (Extended Data 
Fig. 5a—d). There was also an increase in liver cholesterol content in 
shLeXis-transduced mice (Extended Data Fig. 5e). Gene expression 
analysis revealed increased expression of cholesterol biosynthetic 
genes in response to LeXis knockdown (Extended Data Fig. 5f). 
Similar effects of LeXis knockdown were observed in mice treated with 
GW3965 (Extended Data Fig. 5g, h). There was no consistent evidence 
of ER stress or inflammation in these experiments (Extended Data 
Fig. 5i-k). 

As a complementary acute loss-of-function approach, we used anti- 
sense oligonucleotides (ASOs) to target LeXis expression’*!°. Three 
different ASOs that potently blocked hepatic LeXis, but not saline or 
non-targeting ASO controls, increased serum cholesterol levels in the 
setting of LXR activation, with no evidence of hepatotoxicity (Fig. 3a, b 
and Extended Data Fig. 51, m). Furthermore, LeXis ASO administration 
increased cholesterogenic gene expression (Fig. 3c). 

We generated LeXis-deficient mice to determine the consequences 
of chronic loss of LeXis function (Extended Data Fig. 6a—c). Although 
serum cholesterol levels in LeXis-deficient mice in the setting of LXR 
activation were not different from controls (Fig. 3d), the expression of 
sterol synthesis genes in the liver was increased (Fig. 3e). Furthermore, 
LeXis-null mice had increased hepatic cholesterol content when chal- 
lenged with a Western diet (Fig. 3f). Gross and histological examination 
of livers from LeXis-deficient null mice showed changes consistent with 
lipid accumulation (Fig. 3g, h). In contrast to the acute LXR agonist 
studies above, gene expression analysis of LeXis /~ mice maintained 
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Figure 2 | LeXis expression reduces serum 
cholesterol and sterol synthesis through a 
pathway requiring intact SREBP signalling. 

a, Total serum cholesterol levels in 10-week-old 
chow-fed male C57BL/6 mice transduced with 
adenoviral vectors encoding GFP control 
(Ad-GFP) or LeXis (Ad-LeXis) for 6 days (n= 24 
per group). b, Total serum triglycerides levels in 
the mice shown in a (nm = 12-16 per group). 

c, Cholesterol levels in pooled fractionated serum 
from mice treated with Ad-GFP or Ad-LeXis. 
VLDL, very-low-density lipoprotein. d, Analysis 
of gene expression in livers obtained after 6 days 
of transduction with Ad-GFP or Ad-LeXis (n=8 
per group). e, Total serum cholesterol levels 

in chow-fed male Ldir~/~ mice (10 weeks old) 
transduced with Ad-GFP or Ad-LeXis for 6 days 
(n=8 per group). f, Serum cholesterol levels in 
chow-fed wild-type (WT) or liver-specific SCAP 
knockout (Scap~’~) mice gavaged with 40 mg kg”! 
GW3965 for 2 days. g, Cholesterol levels in pooled 
plasma fractions from mice shown in f. h, Total 
serum cholesterol levels in chow-fed Scap~/~ 
mice transduced with Ad-GFP or Ad-LeXis 

for 6 days. All values are mean +s.e.m. NS, not 
significant; *P < 0.05; **P < 0.01; ***P < 0.001; 
*** P< 0.0001 (unpaired two-tailed t-test). 


Figure 3 | Acute and chronic inactivation 

of LeXis alters hepatic lipid metabolism. 

a, LeXis gene expression (normalized to 36B4, 
also known as Rplp0) in livers from C57BL/6 
mice on a chow diet administered 25 mgkg™! 
ASOs intraperitoneally on days 1, 4 and 7, and 
gavaged with 40 mgkg_! GW3965 on days 4, 7 
and 8 (n=5 per group). Ctrl, control. b, Total 
serum cholesterol from mice in a. c, Gene 
expression from C57BL/6 mice on a chow diet 
administered 25 mg kg”! ASOs intraperitoneally 
on days 1, 3 and 5, and gavaged with 40 mg kg! 
GW3965 on days 5 and 6 (n=8 per group). 

d, Total serum cholesterol levels in chow-fed wild- 
type or LeXis-/~ mice gavaged with 40 mgkg"! 
GW3965 for 2 days (n= 8-10 per group). e, Gene 
expression from C57BL/6 wild-type or LeXis~/~ 
mice on a chow diet gavaged with 40 mg kg! 
GW3965 for 2 days (n = 8-10 per group). 

f, Hepatic cholesterol content was normalized to 
liver mass from C57BL/6 wild-type or LeXis~/~ 
mice fed a Western diet for 3 weeks (n = 7-11 per 
group). g, Representative (of three images 

per group) gross appearance of livers from 
wild-type and LeXis~/~ mice after 3 weeks on 

a Western diet. h, Histological sections of liver 
from wild-type and LeXis~/~ mice after 3 weeks 
on a Western diet (haematoxylin and eosin stain 
representative of three images per group). Original 
magnifications, x40 (g) and x1 (h). All values 
(a-f) are mean +s.e.m. *P < 0.05; **P< 0.01; 
#1 D < 0,001; ***P < 0.0001 (ANOVA (b, c) and 
unpaired two-tailed t-test (e, f)). 
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Figure 4 | LeXis interacts with RALY to regulate metabolic gene 
expression. a, Hepal-6 cells were transfected with LeXis, and 24h later 
cellular content was separated into cytoplasmic soluble (cyt), nuclear 
soluble (nuc) and insoluble pellet (pel) fractions. Transcripts in each 
fraction were analysed by qPCR, and fraction purity was validated by 
western blotting with the indicated compartment markers (n = 3 per 
group). b, Representative (of three) micrograph showing LeXis subcellular 
localization in primary mouse hepatocytes by single molecule fluorescence 
in situ hybridization using anti-sense probes to LeXis (red). Nuclei 

were counterstained with DAPI (blue). Original magnification, x63. 

c, Recruitment of RNA polymerase II (RNPII) to promoter regions as 
determined by ChIP-qPCR analysis in livers transduced with control 


on Western diet showed a trend towards decreased sterol synthetic gene 
expression, probably reflecting the marked increased hepatic choles- 
terol content in this setting (Extended Data Fig. 6d). 

To begin to understand how LeXis was influencing hepatic metabo- 
lism, we analysed its subcellular localization. LeXis was almost exclu- 
sively located in the insoluble nuclear pellet in fractionation studies, 
along with the known nuclear long non-coding RNAs (IncRNAs) XIST 
and histone H3 (Fig. 4a). Single molecule RNA fluorescence in-situ 
hybridization with LeXis-specific probes further confirmed its nuclear 
localization (Fig. 4b). 

Owing to the presence of LeXis in the nucleus, we tested its ability 
to affect RNA polymerase II-dependent transcription. Expression of 
LeXis in mouse liver reduced RNA polymerase II engagement at the 
promoters of Srebf2 and its target genes (Fig. 4c). Previous work has 
shown that nuclear IncRNAs can affect transcription by modifying the 
recruitment of proteins to chromatin!”. We used an unbiased IncRNA- 
chromatin affinity capture technique to pull-down LeXis from mouse 
liver and identify interacting proteins!* (Extended Data Fig. 7a). 
Analysis of the LeXis interactome by mass spectrometry identified 
the heterogeneous ribonucleoprotein RALY"” as a binding partner. 
Similar to LeXis, RALY was located in the nuclear pellet (chroma- 
tin) fraction of hepatocytes (Extended Data Fig. 7b). Moreover, an 
antibody to RALY retrieved LeXis in co-immunoprecipitation studies 
(Extended Data Fig. 7c, d). 

RALY contains both an RNA-binding domain and a leucine- 
zipper coiled domain, suggesting it may act as a regulatory factor”. 
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(Ad-GFP) or LeXis-expressing (Ad-LeXis) adenoviruses. Data are 
expressed as percentage input retrieved normalized to an upstream site 
(region 1) (n=3 per group). d, Total serum cholesterol levels in 
14-week-old chow-fed male C57BL/6 mice transduced with control (shCtrl) 
or adenoviral vectors expressing Raly shRNA (shRaly) (n=8 per group). 

e, Gene expression in livers of the mice shown in f. f, Total serum 
cholesterol in chow-fed male C57BL/6 mice transduced with control 
(Ad-GFP) or Ad-LeXis (1.0 x 10° plaque-forming units, p.f.u.) and shCtrl 
or shRaly (2.0 x 10° p.f.u.) (1=7-8 per group). Values are mean +s.d. 

(a, c) or mean +s.e.m. (d). *P < 0.05; **P < 0.01; ***P < 0.001 (unpaired 
two-tailed t-test (d, e) and ANOVA with multi-group comparison (f)). 


Notably, previous unbiased analysis of gene coexpression networks 
has identified Srebf2 as one of the top genes positively coregulated 
with Raly*'. Other studies have shown direct binding of SREBP-2 at 
the Raly promoter”. Unbiased protein homology analysis revealed 
extensive structural conservation between RALY and RNA binding 
motif protein 14 (RBM14, also known as CoAA)” (Extended Data 
Fig. 7e), a known steroid receptor coactivator™’. This led us to hypoth- 
esize that RALY may act as transcriptional cofactor for genes involved 
in cholesterol biosynthesis. In line with this idea, adenovirus-mediated 
knockdown of RALY in mouse liver reduced serum cholesterol, 
mimicking the effect of LeXis expression (Fig. 4d and Extended Data 
Fig. 7f). This effect was correlated with reduced expression of Srebf2 
and its target genes (Fig. 4e and Extended Data Fig. 7g). Unbiased gene 
expression profiling of liver revealed that RALY knockdown prefer- 
entially affected cholesterol biosynthetic pathways (Extended Data 
Fig. 8a, b). The effects of RALY were independent of LDLR expression, 
since they were preserved in Ldlr-null mice (Extended Data Fig. 9a, b). 
The actions of LeXis in vivo were dependent on RALY, since the ability 
of LeXis to alter serum cholesterol levels and hepatic gene expression 
was impaired in the setting of RALY knockdown (Fig. 4fand Extended 
Data Fig. 9c). Finally, ChIP-qPCR analysis of mouse liver revealed that 
RALY associated with cholesterol biosynthetic gene promoters, and 
that RALY occupancy was reduced in the setting of LeXis expression 
(Extended Data Fig. 9d). 

This work identifies the non-coding RNA LeXis as an additional 
mediator of the complex effects of LXR signalling on hepatic lipid 
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metabolism. Our data suggest that LeXis contributes to the ability of 
LXRs to inhibit cholesterol synthesis. It is important to acknowledge, 
however, that the involvement of additional pathways in this crosstalk 
is not excluded by the present work. The demonstration that LeXis 
expression is responsive to dietary cues and can modulate physiological 
pathways with links to common diseases expands our understanding of 
the regulatory potential of non-coding RNA. Notably, the consequences 
of acute and chronic loss of LeXis expression are only partially overlap- 
ping, perhaps reflecting compensation in the setting of developmental 
deletion”. 

Although the rapid sequence evolution of IncRNAs presents a chal- 
lenge to identifying functional counterparts between species”, batch 
coordinate conversion between mouse and human assemblies revealed 
moderate conservation of the LeXis genomic sequence in a region 
adjacent to the human ABCA1 gene. An annotated putative ncRNA 
(TCONS_00016452) in this region was robustly induced by LXR acti- 
vation in human hepatocyte cell lines (Extended Data Fig. 10). In the 
future it will be of interest to assess whether this sequence or an as yet 
to be identified IncRNA is a functional orthologue of LeXis. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Reagents, plasmids and gene expression. GW3965 was synthesized as previ- 
ously described’’. LG268 was from Ligand Pharmaceuticals. Oxysterols were 
purchased from Sigma and used as described**. Simvastatin sodium salt was from 
Calbiochem. Ligands were dissolved in dimethyl sulfoxide before use in cell culture. 
LeXis was amplified from GW3695-treated primary mouse hepatocytes using KOD 
polymerase (Millipore) and primers designed to provide flanking attB sequences 
and a SacI site at the immediate 3’ end. The fragments were then cloned into 
pDONR221 using the Gateway system and the minimal SV40 polyadenylation 
sequence was inserted at the SacI site. For transient transfections and viral vector 
production the entry clone was transferred into the pAd/CMV/V5-DEST Gateway 
vector by LR recombination. We estimate transcription from this vector to append 
109 nucleotides at the 5’ end, and 29 nucleotides at the 3’ end of the cloned LeXis 
sequence. To obtain the shLeXis adenovirus, we used BLOCK-iT kit as described 
(Invitrogen)°. In brief, Invitrogen based software was used for original nucleotide 
generation targeting the LEXIS fragment and cloned into pENTR/U6. The result- 
ing pENTR/U6-LEXIS shRNA plasmids were tested for their ability to inhibit 
overexpressed LEXIS in transient transfection experiments in HEK293T cells and 
then transferred by Gateway recombination into the pAd/BLOCK-iT-DEST des- 
tination vector for viral particle generation. Viruses were amplified, purified and 
titred by Viraquest. For gene expression analysis, RNA was isolated using TRIzol 
reagent (Invitrogen) and analysed by qPCR using an Applied Biosystems 7900HT 
sequence detector or Applied Biosystems Quant Studio 6 Flex. Results are normal- 
ized to 36B4 or cyclophilin (also known as Ppia). Immunohistochemical staining 
of paraffin-embedded livers were done by the UCLA Translational Pathology Core 
Laboratory. 

Animals and diets. All animals (C57BL/6, greater than 10 generations back- 
crossed) were housed in a temperature-controlled room under a 12-h light/12-h 
dark cycle and pathogen-free conditions. For adenovirus experiments, aged- 
matched mice were purchased from Jackson Laboratories. Littermates were manu- 
ally randomized to different treatment groups. Investigators were blinded to group 
allocation for some but not all studies. LXRa~/~, LXRG~!~ and LXRa(3-/~ mice 
were originally provided by D. Mangelsdorf. Floxed Scap~/~ mice were previously 
described”. LeXis global knockout mice were generated at UCDavis KOMP using 
strategy outlined in Extended Data Fig. 6. Mice were fed a chow diet except as 
indicated, where mice were placed on a Western diet (21% fat, 0.21% cholesterol; 
D12079B; Research Diets Inc.) or were gavaged with either vehicle or 40mgkg ! 
GW3965. Livers were obtained 4h after the last gavage. We measured cholesterol 
and triglycerides as previously described*’. For adenoviral infections, age-matched 
(9-11 weeks old) male mice were injected with 2.0 x 10? p.f.u. by tail-vein injection 
unless otherwise specified. Mice were euthanized 6 days later after a 6-h fast. At 
the time of euthanization, liver tissue and blood was collected by cardiac puncture 
and immediately frozen in liquid nitrogen and stored at —80°C. Liver tissue was 
processed for isolation of RNA and protein as above. Generation 2.5 constrained 
ethyl ASOs, synthesized as described previously*!, were administered by three 
25mgkg ' intraperitoneal doses together with 40mg kg! GW3965. Animals 
were euthanized on day 6 or 8 as indicated in figure legends. Most experiments 
were performed using male mice. All animal experiments were approved by the 
UCLA Institutional Animal Care and Research Advisory Committee. 

Cell culture. Primary peritoneal macrophages were isolated 4 days after thiogly- 
collate injection and prepared as described**. Mouse primary hepatocytes were 
isolated as previously described and cultured in William’s E medium with 5% 
EBS”®., Peritoneal cells were incubated in 0.5% FBS in DMEM, with 51M simvas- 
tatin and 100,1M mevalonic acid. Five to eight hours later, cells were pretreated 
with dimethylsulfoxide (DMSO) or appropriate ligand overnight. In vitro trans- 
lation assay was performed using TnT Coupled Transcription/Translation System 
(PROMEGA) according to the manufacturer’s protocol. The cell lines HEK293T, 
HEK293A and Hepal-6 were originally obtained from ATCC. All cells were tested 
for mycoplasma contamination. 

RACE. The 5’ and 3/ ends of the LeXis transcript were defined using mouse liver 
RNA and the FirstChoice RLM-RACE kit (Ambion) according to manufactur- 
er’s protocol, with modifications. In brief, for the 5’ RACE, degraded messenger 
RNA 5’ ends were dephosphorylated with CIP, and then full-length mRNA was 
decapped with TAP. Following 5’ RACE adaptor ligation, reverse transcription 
was performed using SuperScriptIII First-Strand Synthesis system (Invitrogen) 
and LeXis-specific primers. For the 3’ RACE, RNA was reverse transcribed using 
SuperScriptIII First-Strand Synthesis system (Invitrogen) and the adaptor-linked 
oligo dTs. The resulting cDNA was amplified by nested PCR across a 55-65 °C 
melting temperature gradient using KOD polymerase (Millipore), with the inner 
primers containing attB sequences. Aliquots of reactions were inspected on 1% 
agarose gels for product size and abundance. Products of select PCR reactions were 
purified using NucleoSpin Gel and PCR Cleanup kit (Clontech) and were inserted 
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into pDONR221 by Gateway cloning. Cloned fragments were sequenced and then 
aligned to the mouse genome with the BLAST analysis tool. 

RNA fractionation. The PAd/CMV-LeXis vector was transfected into Hepal-6 
cells using BioT reagent (Bioland Scientific LLC) and 24 h later subcellular RNA 
fractions were obtained according to the protocol described previously33. Lysate 
aliquots were inspected for fractional purity by western blotting with antibodies 
against a-tubulin, SNRP70 and histone H3 as cytoplasmic, nucleoplasmic and 
chromatin bound markers, respectively. 

RNA-seq. RNA-seq libraries, starting with 500 ng total RNA, were constructed 
with the TruSeq RNA Sample Prep Kits from Illumina on RNA isolated from 
primary hepatocytes treated with or without GW3965. Samples were indexed 
with adapters and submitted for paired-end 2 x 100-bp sequencing in Illumina 
HiSeq2000. RNA-seq reads were aligned with TopHatv2.0.2 to the mouse genome, 
version mm9 (ref. 34). The TopHat alignment rate was 85%, resulting in an aver- 
age of 65 million reads per sample. Transcripts were assessed and quantities were 
determined by Cufflinks v2.0.2, using a GTF file based on Ensembl mouse NCBI37. 
Comparison expression levels were made using fragments per kilobase of exon 
per million fragments mapped (FPKM) values using Cuffdiff from the Cufflinks 
package*®. Data analysis was performed by UCLA DNA Microarray Core. 

Lipid analysis. Tissue lipid was obtained using a Folch extraction. In brief, 
chloroform extracts were dried under nitrogen and solubilized in water. Tissue 
and serum cholesterol and triglycerides were determined using a commercially 
available enzymatic kit (Wako). Hepatic cholesterol content was normalized to 
liver weight and protein concentration. Mice were fasted for at least 6h before 
blood collection and euthanization. Plasma lipoprotein fractions were analysed 
by FPLC. 

Microarray. For cDNA microarray analysis, primary hepatocytes cells were treated 
as indicated above with either DMSO or GW3695. These samples were from an 
independent cohort from those submitted for RNA-seq. For each condition, two 
independent samples were processed. Transcriptional profiling was performed at 
the University of California, Los Angeles, microarray core facility by using Agilent 
SurePrint G3 Gene Expression array. Data were analysed using GeneSpring soft- 
ware (Agilent Technologies) and David**. 

ChIP. ChIP studies were performed as described elsewhere*”. In brief, mouse 
livers were cross-linked using a final formaldehyde concentration of 1% at room 
temperature for 10 min. The reaction was quenched with the addition of glycine. 
For sonication, 0.3 ml (1/3) of nuclear lysate was sonicated for 25-30 cycles, 30s on 
30s off at 4°C, with BioRuptor twin sonicator (Diagenode). Sonicated Chromatin 
was incubated overnight at 4°C with control IgG or 25 1g of anti-LXRa antibody 
(PPZ0412, ChIP Ggade, Abcam), anti-RALY antibody (EPR10121, Abcam), or 
Pol II antibody (N-20, Santa Cruz Biotechnology). Protein A dynabeads (5011 per 
immunoprecipitation sample) were added for 4h. After incubation beads were 
washed with wash buffer A (50 mM HEPES, pH 7.9, 140mM NaCl, 1mM EDTA, 
1% Triton X-100, 0.1% Na-deoxycholate, 0.1% SDS, 1x protease inhibitors freshly 
added), buffer B (50 mM HEPES, pH 7.9, 500 mM NaCl, 1 mM EDTA, 1% Triton 
X-100, 0.1% Na-deoxycholate, 0.1% SDS) and finally LiCL buffer (20 mM Tris, 
pH 8.0, 250 mM LiCl, 1 mM EDTA, 0.5% Na-deoxycholate, 0.5% NP-40). 
Reverse crosslinking was performed at 60°C overnight, mixed at 1,000 r.p.m., 
and DNA was extracted using a phenol-chloroform phase lock tube (5 PRIME) 
or Nucleospin PCR cleanup column (Macherey-Nagel). A standard curve for 
PCR was generated from serial dilutions of input samples and data expressed as 
percentage of input. 

Chromatin isolation by RNA purification. Chromatin isolation by RNA purifica- 
tion (ChIRP) was performed as described previously’®. In brief, mouse livers were 
cross-linked using glutaraldehyde. After glycine quenching, the nuclear lysate was 
sonicated for 25-30 cycles, 30s on 30s off at 4°C, with BioRuptor twin sonicator 
(Diagenode). LeXis and LacZ pulldown probes with BiotinTEG at 3’ were designed 
by Biosearch Tecnologies (see Supplementary Information) and allowed to hybridize 
overnight with sonicated chromatin at 37 °C (100 pmol probe per 1 ml chromatin). 
After hybridization, C1 Dynabeads (Life Technologies) were added and incubated 
for 30 min. For protein elution for mass spectrometry analysis, washed beads were 
resuspended in 3x original volume of DNase buffer (100 mM NaCl and 0.1% 
NP-40), and protein was eluted with a cocktail of 50 mM triethyl ammonium 
bicarbonate, 12 mM sodium lauryl sarcosine, and 0.5% sodium deoxycholate sup- 
plemented with 100j1g ml“! RNase A (Sigma-Aldrich) and 0.1 Uj? RNase H 
(Epicentre), and 100 Uml~! DNase I (Invitrogen). For RNA isolation, beads 
were resuspended in protensase K buffer (100 mM NaCl, 10 mM TrisCl, pH 7.0, 1 mM 
EDTA, 0.5% SDS, 5% by volume proteainse K (AM2546, Ambion) 20 mg ml!) and 
incubated at 50°C followed by Trizol isolation and DNase treatment. 

Single molecule RNA FISH. Custom Stellaris FISH probes were designed 
against LeXis. Stellaris probe set labelled with CAL Fluor Red 610 and RNA FISH 
performed as described previously**. In brief, hepatocytes were fixed with 3.7% 
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formaldehyde in PBS followed by 70% ethanol treatment to permeabilize cells. Cells 
were washed with 10% formamide in 2x SCC followed by treatment in humidified 
chamber with addition of probes (125 nM) in hybridization buffer (100 mgml“! 
dextran sulfate and 10% formamide in 2 x SSC). Cells were incubated in the dark 
at 37°C for 4h. DAPI nuclear stain (5 ngml') was applied after washing with 10% 
formamide in 2x SCC. Images obtained using a Zeiss Z1 AxioObserver fluorescent 
microscope. 

Statistical analysis. A non-paired Student's t-test or ANOVA was used to deter- 
mine statistical significance, defined at P< 0.05. Unless otherwise noted, error bars 
represent s.d.. Experiments were independently performed at least twice. Group 
sizes were based on statistical analysis of variance and prior experience with similar 
in vivo studies. 
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Extended Data Figure 1 | Identification of LeXis as an LXR-responsive the RNA-seq study shown in b. Fold change represents ratio of transcript 
IncRNA. a, qPCR analysis of gene expression in livers from mice expression in GW3965 compared to DMSO treatment samples. Cut-off 
gavaged with 40 mgkg ' GW3965 for 2 days. Mice were fasted for 4h fold induction of 1.1 used (total 4,708 transcripts induced). d, Heat map 
before collection (n =4 per group). Values are mean + s.e.m. *P < 0.05; representation of the results of transcriptional profiling (Agilent SurePrint 
**P < 0.01; ***P < 0.001 (unpaired two-tailed t-test). b, Volcano plot G3 Gene Expression arrays) of primary hepatocytes treated with 11M 
of RNA-seq results from primary hepatocytes treated for 16h with 114M GW3965 for 16h. Data were analysed using GeneSpring software. 


GW3965. ¢, Relative expression of selected LXR target genes identified in 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 50° 


control 


th | 
~ MOU) 


— 
. | ms 
= 


| 
ul 


SSS Lexis 
b 50 kb 
existing annotation: 
Ensembl transcript ENSMUST00000146501 
Ensembl transcript ENSMUST00000139222 
RefSea transcript XR 881279.1 


it]—])_— 7131 (7: _ig 


major transcripts identified: 


clone 19, MRNA=1020b 
clone 18, MRNA=1212b 


c_?YLNVY|JN€X#¢T—————_ 


minor transcripts identified: 


RE yf +8 
as oe 
rr | 
| 
4} +--+] 
a 
1 }@$ $$ 


Extended Data Figure 2 | Schematic of the LeXis gene locus andits RNA _ treated with 1|.M GW3965 for 16h. b, Exon structure of major and minor 
transcripts. a, UCSC genome browser view of RNA-seq transcriptional LeXis transcripts identified by RACE, aligned for comparison to existing 
signatures at the Abcal and LeXis locus in mouse primary hepatocytes annotation in the indicated databases. 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | Regulation of LeXis expression. a, PCR 
analysis of primary mouse hepatocytes from wild-type or double knockout 
(LXRa~/~ and LXRG"'-) mice treated with 1 .M GW3965 and/or 50nM 
LG268. Results are representative of four independent experiments. 

b, LeXis expression in primary mouse hepatocytes from wild-type, 
LXRa~'~, LXRG-'~ or double knockout mice treated with GW3965 and 
LG268. Results are representative of three independent experiments. 

c, LeXis expression in primary hepatocytes treated with GW3965 and 
LG268 in the presence or absence of the protein synthesis inhibitor 
cycloheximide (Chx, 1 jgjl~!). Results are representative of three 
independent experiments. d, LeXis expression in primary hepatocytes 
treated with GW3965 and LG268 (50 nM) in the presence or absence of 
25-hydroxycholesterol (250H, 2.5 1M). Results are representative of three 
independent experiments. e, Gene expression in tissues from C57BL/6 
mice gavaged with 40 mg kg”! GW3965 for 3 days (n=5 per group). 

*P< 0.05; **P< 0.01; ***P < 0.001; ****P < 0.0001 (unpaired two-tailed 


t-test). f, Relative firefly luciferase activity measured from the pgl4.10 
vector or pgl4.10 with the LeXis promoter cloned upstream of luciferase. 
Reporters were co-transfected in HEK293 cells and treated with GW3965 
for 24h. Activity is normalized to Renilla luciferase internal control. 

g, Analysis of LXRa binding to the LeXis promoter in mouse liver by 
ChIP-qPCR. Schematic shows primer pair positions relative to the LXR- 
response element in the LeXis and Abca1 (positive control) promoters. 
Primers flanking a region of the MAP kinase I promoter served as a 
negative control. ChIP values are presented as percentage of input DNA 
(n=4 per group). Values are mean +s.e.m. (e, g) or mean +s.d. (a-d). 
h, Prediction of coding potential using the coding-non-coding index 
(CNCI) software. Negative value indicates low coding potential. 

i, Comparison of protein coding potential using coding potential 
calculator (CPC) score for LeXis, the non-coding gene HOTAIR, and 
control protein-coding transcripts. j, In vitro translation of LeXis and 
luciferase control RNAs. 
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Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | LeXis modulates the expression of genes liked 
to sterol synthesis. a, Gene expression in livers obtained after 6 days 

of transduction with Ad-GFP or Ad-LeXis (n =8 per group). b, Serum 
alanine aminotransferase activity in chow-fed mice transduced with 
Ad-GFP or Ad-LeXis for 6 days (n= 8 per group). c, Gene expression in 
livers obtained after 6 days of transduction with Ad-GFP or Ad-LeXis 
(n=8 per group). d, Unbiased pathway analysis (GeneSpring software) of 
the results from transcriptional profiling of livers treated with Ad-GFP or 
Ad-LeXis (n = 4 per group). e, Hepatic cholesterol content normalized to 
liver mass in wild-type mice transduced with Ad-GFP or Ad-LeXis (n = 8 
per group). f, Gene expression in mouse hepatocytes treated overnight 
with 111M GW3965. Results are representative of two independent 


experiments. g, Gene expression in mouse hepatocytes treated overnight 
with Ad-GFP or Ad-LeXis for 24h. Results are representative of two 
independent experiments. h, Cholesterol levels in pooled fractionated 
serum from Ldir~/~ mice transduced with Ad-GEP or Ad-LeXis. i, Hepatic 
cholesterol content normalized to liver mass in Ldlr~/~ mice transduced 
with Ad-GFP or Ad-LeXis (n = 8 per group). j, Gene expression in livers 
from chow-fed wild-type or liver-specific Scap~/~ mice gavaged with 

40 mgkg~! GW3965 for 2 days (n=5 (WT Veh), 8 (WT GW), 5 (KO Veh) 
and 7 (KO GW)). k, Gene expression in livers from Scap ~~ chow-fed mice 
transduced with Ad-GFP or Ad-LeXis for 6 days (n=5 per group). Values 
are mean + s.e.m. (a-c, e, i-k) or mean +s.d. (f, g). *P < 0.05; **P<0.01 
(unpaired two-tailed f-test). 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | Inhibition of LeXis expression alters serum 
cholesterol level. a, In vitro validation of LeXis knockdown using shLeXis1 
and shLeXis8 vectors. Results are representative of three independent 
experiments. b, Total serum cholesterol measured in C57BL/6 mice 

fed 2 weeks of a Western diet and transduced with adenovirus shCtrl 

or shLeXis8 for 6 days (n = 6-8 per group). c, Cholesterol levels in 
pooled fractionated serum from mice transduced with shCtrl or shLeXis 
adenovirus. d, Total serum cholesterol from male C57BL/6 mice fed 

a Western diet for 2 weeks and then transduced with control (shCtrl) 

or adenoviral vectors expressing shRNA targeting LeXis (shLeXis1) 

(n=8 per group). e, Hepatic cholesterol content normalized to liver 
mass for the mice shown in d (n = 8 (shCtrl) and 7 (shLeXis1)). f, Gene 
expression in livers of mice fed a Western diet for 2 weeks and then 
transduced with shCtrl or shLeXis (n = 8 (shCtrl) and 7 (shLeXis1)). 

g, Total plasma cholesterol levels in chow-fed C57BL/6 mice transduced 
with shCtrl or shLeXis adenovirus and gavaged with 40 mgkg! 


GW3965 for 6 days (n =8 per group). h, Gene expression in livers of 
chow-fed C57BL/6 mice transduced with shCtrl or shLeXis adenovirus 
and gavaged with 40 mg kg~! GW3965 for 6 days (n =8 per group). 

i, Serum alanine aminotransferase activity from mice in h. j, Serum 
alanine aminotransferase activity from mice in d. k, Gene expression in 
livers of mice fed a Western diet for 2 weeks and then transduced with 
shCtrl or shLeXis (n = 8 (shCtrl) and 7 (shLeXis1)). 1, Serum alanine 
aminotransferase activity from C57BL/6 mice on a chow diet administered 
25 mgkg~' ASOs intraperitoneally on days 1, 4 and 7, and gavaged with 
40 mgkg~! GW3965 on days 4, 7 and 8 (n=5 per group). m, Total serum 
cholesterol from C57BL/6 mice on a chow diet administered 25 mgkg ! 
ASOs intraperitoneally on days 1, 3 and 5, and gavaged with 40 mgkg * 
GW3965 on days 5 and 6 (n=8 per group). Values are mean + s.d. (a) or 
mean +s.e.m. (f, h-m). *P< 0.05; **P< 0.01; ***P< 0.001 (unpaired 
two-tailed t-test (b, dh, j) and ANOVA with multi-group comparison 
(m)). 
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Extended Data Figure 6 | Generation of global LeXis~/— mice. 

a, Schematic of knockout strategy. Vector construct designed to ablate 
entire LeXis transcript. Targeted mice were crossed with Flp~/~ (also 
known as Hpd~‘~) mice to excise the Neo cassette since it contains an 
active bi-directional promoter. b, c, Gene expression (= 3 per group) 


(unpaired two-tailed t-test). 
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Extended Data Figure 7 | Identification of RALY as a LeXis-interacting 
protein. a, Complimentary biotin-labelled tiling oligonucleotides 
incubated with cellular extracts from liver. Probes sets designed to retrieve 
LeXis (Lex 1 and 2) or LacZ (LacZ 1 and 2). Percentage input of retrieved 
LeXis and 36B4 are shown (n =4 per group). b, Cellular contents separated 
into cytoplasmic soluble (C), nuclear soluble (N) and insoluble (pellet, P) 
fractions were analysed by western blotting with anti-RALY and anti- 
histone H3 antibodies. c, Antibodies were incubated with cellular lysates 
from mouse hepatocytes and interaction with endogenous RALY was 
assessed after immunoprecipitation and western blot. d, Complexes from 
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b were analysed for presence of LeXis or Gapdh by reverse transcription 
qPCR (RT-qPCR) and signals were normalized to 36B4 (n = 4 per group). 
e, Sequence alignment, predicted secondary structure, and 3D model 

of RALY are shown as reported using the Phyre2 (Protein Homology/ 
analogueY Recognition Engine V 2.0) web portal. f, Western blot for 
RALY from livers transduced with adenoviral vectors expressing control 
shRNA (shCtrl) or Raly shRNA (shRaly) (1 = pooled 4 animals per group). 
g, Gene expression from liver from 14-week-old chow-fed male C57BL/6 
mice transduced with control (shCtrl) or shRaly (n = 8 per group). Values 
are mean + s.d. (a) or mean +s.e.m. (g). 
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Extended Data Figure 8 | Knockdown of RALY preferentially affects pathways link to cholesterol metabolism in mouse liver. a, b, Most significant 
Gene Ontology terms from microarray analysis from livers treated with shCtrl or shRaly. Analysis performed using GeneSpring and DAVID. 
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Extended Data Figure 10 | Batch genome conversion between mouse 
and human at LeXis gene locus. Gene expression for putative human non- 
coding RNA TCONS_00016452 in hepatocyte cell lines treated with 1 1M 
GW3965 (n=3 per group). Values are mean + s.d. *P < 0.05; **P < 0.01; 


*** D < 0.001 (unpaired two-tailed t-test). 
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Overcoming EGFR(T790M) and EGFR(C797S) 
resistance with mutant-selective allosteric inhibitors 


Yong Jia!, Cai-Hong Yun?*+, Eunyoung Park, Dalia Ercan*, Mari Manuia!, Jose Juarez!, Chunxiao Xu‘, Kevin Rhee’, 
Ting Chen‘, Haikuo Zhang‘, Sangeetha Palakurthi°, Jaebong Jang’, Gerald Lelais!, Michael DiDonato!, Badry Bursulayal, 
Pierre-Yves Michellys', Robert Epple!, Thomas H. Marsilje', Matthew McNeill!, Wenshuo Lu!, Jennifer Harris!, Steven Bender!, 


Kwok-Kin Wong*”, Pasi A. Janne*° & Michael J. Eck? 


The epidermal growth factor receptor (EGFR)-directed tyrosine 
kinase inhibitors (TKIs) gefitinib, erlotinib and afatinib are 
approved treatments for non-small cell lung cancers harbouring 
activating mutations in the EGFR kinase’, but resistance arises 
rapidly, most frequently owing to the secondary T790M mutation 
within the ATP site of the receptor**. Recently developed mutant- 
selective irreversible inhibitors are highly active against the T790M 
mutant, but their efficacy can be compromised by acquired 
mutation of C797, the cysteine residue with which they form a 
key covalent bond’. All current EGFR TKIs target the ATP-site 
of the kinase, highlighting the need for therapeutic agents with 
alternative mechanisms of action. Here we describe the rational 
discovery of EAI045, an allosteric inhibitor that targets selected 
drug-resistant EGFR mutants but spares the wild-type receptor. 
The crystal structure shows that the compound binds an allosteric 
site created by the displacement of the regulatory C-helix in an 
inactive conformation of the kinase. The compound inhibits L858R/ 
T790M-mutant EGFR with low-nanomolar potency in biochemical 
assays. However, as a single agent it is not effective in blocking 
EGFR-driven proliferation in cells owing to differential potency 
on the two subunits of the dimeric receptor, which interact in an 
asymmetric manner in the active state’. We observe marked synergy 
of EAI045 with cetuximab, an antibody therapeutic that blocks 
EGER dimerization””°, rendering the kinase uniformly susceptible 
to the allosteric agent. EAI045 in combination with cetuximab is 
effective in mouse models of lung cancer driven by EGFR(L858R/ 
T790M) and by EGFR(L858R/T790M/C797S), a mutant that is 
resistant to all currently available EGFR TKIs. More generally, our 
findings illustrate the utility of purposefully targeting allosteric sites 
to obtain mutant-selective inhibitors. 

Diverse activating mutations within the EGFR kinase domain give 
rise to a subset of non-small cell lung cancers (NSCLCs). The L858R 
point mutation and small in-frame deletions in the region encoded by 
exon 19 are the most common mutations, and are among a subset of 
oncogenic EGFR alterations that confer enhanced sensitivity to EGFR- 
directed TKIs'!~*. The dose-limiting toxicity of anilinoquinazoline 
TKIs such as erlotinib and gefitinib arises from inhibition of wild-type 
EGFR in the skin and GI tract, thus this enhanced sensitivity relative to 
wild-type EGFR creates a therapeutic window that allows effective 
treatment of patients whose tumours are driven by these muta- 
tions. The T790M resistance mutation closes this window, in part by 
increasing the affinity of the mutant receptor for ATP, which in turn 
diminishes the potency of these ATP-competitive inhibitors'*. Mutant- 
selective irreversible inhibitors, including the tool compound WZ4002 
(ref. 15) and the clinical compounds osimertinib (AZD9291)*!* and 
rociletinib (CO-1686)°, are based on a pyrimidine scaffold, and also 


incorporate a Michael acceptor group that forms a covalent bond with 
Cys797 at the edge of the ATP binding pocket. Because they bind irre- 
versibly, these agents overcome the enhanced ATP affinity conferred 
by the T790M mutation. Compounds of this class are demonstrating 
significant efficacy against T790M mutant tumours in ongoing clinical 
trials'”8, and osimertinib was recently approved by the US Food and 
Drug Administration for patients with EGFR T790M-positive NSCLC 
following progression on previous EGFR TKI therapy. However, 
laboratory studies and early clinical experience indicate that the efficacy 
of these agents can be compromised by mutation of Cys797, which 
thwarts formation of the potency-conferring covalent bond”!>"?. 

Reasoning that an allosteric inhibitor could also overcome the 
enhanced ATP affinity conferred by the T790M mutation, we screened 
an ~2.5 million compound library using purified EGFR(L858R/ 
T790M) kinase. The biochemical screen was carried out using 1 1M 
ATP, and active compounds were counter-screened at 1mM ATP 
and against wild-type EGFR to identify those that were potentially 
non-ATP-competitive and mutant selective. Among the compounds 
identified in the screen, EGFR allosteric inhibitor-1 (EAI001, Fig. 1a) 
was of particular interest owing to its potency and selectivity for mutant 
EGER (half maximal inhibitory concentration (IC59) =0.0241.M for 
L858R/T790M at 1mM ATP, ICs > 50M for wild-type EGFR). 
Further characterization of the mutant-selectivity of EAI001 revealed 
modest potency against the isolated L858R and T790M mutants 
(0.75 1M and 1.7 .M, respectively, Extended Data Fig. 1a). Medicinal- 
chemistry-based optimization of this compound yielded EAI045 
(Fig. 1a), a 3nM inhibitor of the L858R/T790M mutant with ~1000- 
fold selectivity versus wild-type EGFR at 1 mM ATP (Table 1). Enzyme 
kinetic characterization confirmed that the mechanism of inhibition 
was not competitive with respect to ATP (Table 1, Extended Data 
Fig. 1b). Profiling of EAI045 against a panel of 250 protein kinases 
revealed pronounced selectivity; no other kinases were inhibited by 
more than 20% at 11M EAI045 (Extended Data Table 1). Evaluation 
of EA1045 in a safety pharmacology assay panel revealed also excellent 
selectivity against non-kinase targets (Extended Data Table 2). 

The crystal structure of EAI001 bound to T790M-mutant EGFR 
showed that the compound binds in an allosteric pocket that is 
created in part by the outward displacement of the C-helix in the inactive 
conformation of the kinase (Fig. 1b, c, Extended Data Table 3). 
The compound binds as a ‘three-bladed propeller’ with the aminothi- 
azole moiety inserted between the mutant gatekeeper methionine and 
active site residue Lys745. The phenyl substituent extends into a hydro- 
phobic cleft at the back of the pocket and is in contact with Leu777 and 
Phe856. Finally, the 1-oxoisoindolinyl group extends along the C-helix 
towards the solvent exposed exterior. The compound also forms a 
hydrogen bond with Asp855 in the DFG motif. In further support of 
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ATP site 


Figure 1 | Structure and binding mode of allosteric EGFR inhibitors. 

a, Chemical structures of EAI001 and EAI045. b, Overall view of the 
structure of EGFR(T790M/V948R) bound to EAI001 and AMP-PNP. 
EAI001 is shown in CPK-coloured form with carbon atoms in green. The 
V948R substitution was introduced to allow crystallization of the kinase in 
an inactive conformation’. c, Detailed view of the interactions of EAI001. 
A hydrogen bond with Asp855 in the DFG-motif of the kinase activation 
loop is shown as a dashed red line. d, The structure of irreversible inhibitor 
neratinib bound to EGFR(T790M) (PDB, 2JIV). Neratinib occupies the 
ATP site, but also extends into the allosteric pocket occupied by EAI001. 


a non-ATP competitive mechanism, the ATP-analogue adenylyl-im- 
idodiphosphate (AMP-PNP) is bound in the expected manner in the 
active site cleft (Fig. 1c). 

Interestingly, the EGFR inhibitors neratinib” and lapatinib*! extend 
into the allosteric site and make interactions that resemble those of 
two of the three blades of the allosteric agents (Fig. 1d, Extended Data 
Fig. 2a, c). These ATP-competitive inhibitors are not mutant selec- 
tive, and they span both the ATP and allosteric sites. Additionally, we 
note that the EGFR allosteric pocket is roughly analogous to a site in 
MEK! that is targeted by a number of allosteric inhibitors that are now 
approved or in clinical trials””. Despite the similar location of the MEK 
allosteric site, there is no structural correspondence in the binding 
modes of the respective allosteric inhibitors (Extended Data Fig. 2a, b). 

The mutant-specificity of the EGFR allosteric inhibitors arises from 
at least two effects. Most apparently, the direct contact of the aminothi- 
azole group with the mutant gatekeeper methionine residue can explain 
the selectivity for the T790M mutant. Second, the compound cannot 
bind the fully inactive conformation of the wild-type kinase; simple 
modelling reveals steric clashes of EAI001 with Leu858 and Leu861 in 
the N-terminal portion of the activation loop (Extended Data Fig. 3). 
The L858R mutation rearranges this portion of the activation loop”, 
thereby enlarging the allosteric pocket. EAI045 may also inhibit other 
mutants with a similar mechanism of activation, such as L861Q, but 
we do not expect it to inhibit most exon 19 deletion variants. These 
mutations shorten the loop leading into the C-helix and may therefore 
prevent opening of the allosteric pocket. 

Initial studies of the cellular activity of EAI045 showed that it 
potently decreased, but did not completely eliminate, EGFR autophos- 
phorylation in H1975 cells, an L858R/T790M-mutant NSCLC cell line 
(Fig. 2a). A similar effect was observed in NIH-3T3 cells stably trans- 
fected with the L858R/T790M mutant (Extended Data Fig. 4a). This 
inhibition was selective for mutant EGFR; EAI045 potently inhibited 
EGFR Y1173 phosphorylation in H1975 cells (half maximal effective 
concentration (ECs9) =2nM), but not in HaCaT cells, a keratinocyte 
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Table 1 | Inhibitory activity of EAI045 on wild type EGFR and 
selected mutants 


ATP (iM) EAIO4S ICso (iM) 
Wild type L858R T790M L858R/T790M 

1 1.6 0.076 0.049 0.002 

10 1.9 0.019 0.19 0.002 

100 35) 0.009 0.5 0.003 

1000 43 0.009 0.6 0.003 


cell line with wild-type EGFR (Extended Data Table 4). We observed 
an intermediate level of activity in the L858R-mutant H3255 cells, a 
pattern consistent with our biochemical inhibition data (Extended Data 
Table 4). Despite potent inhibition of mutant EGFR, EAI045 showed no 
anti-proliferative effect in the H1975 and H3255 cell lines with concen- 
trations as high as 101M (Extended Data Table 4). Profiling in a panel of 
EGFR-mutant Ba/F3 cells revealed that EAI045 inhibited proliferation 
of L858R/T790M and L858R mutant cells, but not the exon19del/ 
T790M or parental Ba/F3 cells, indicative of on-target mutant-selective 
activity of the allosteric inhibitor (Extended Data Fig. 4b-e). However, 
half-maximal inhibition required ~10|1M EAI045, a concentration 
much higher than the biochemical ICs» of the compound. 

In light of the incomplete inhibition of EGFR autophosphorylation 
and the allosteric mechanism of action of EAI045, we wondered to what 
extent ligand stimulation would affect inhibition of the mutant receptor. 
We compared inhibition of EGFR Y1173 phosphorylation in H1975 
cells in the presence and absence of exogenous EGF (10 ng ml!) using 
an ELISA-based assay. EAI045 inhibited EGFR phosphorylation with 
a similar ECs irrespective of EGF stimulation, but notably, inhibition 
plateaued at 50% in the presence of ligand (Fig. 2b). This phenomenon 
suggests two populations of receptor, one that remains sensitive to 
the allosteric inhibitor upon ligand stimulation, and another, equal in 
number, that is rendered insensitive. Ligand-induced dimerization of 
the EGF receptor is known to induce an asymmetric interaction of the 
kinase domains’, and is an apparent potential source of two receptor 
populations with differential inhibitor sensitivity. 

In the EGFR asymmetric dimer, the C-lobe of the ‘activator subunit 
impinges on the N-lobe of the ‘receiver’ subunit, inducing an active 
conformation in the receiver by reorienting the regulatory C-helix 
to its inward position (Fig. 2c). In wild-type EGFR, only the receiver 
subunit is activated. By contrast, both subunits in a mutant recep- 
tor are expected to be catalytically active, because oncogenic kinase 
domain mutations induce the active conformation even in the absence 
of ligand. As explained above, EA1045 binds a ‘C-helix out’ conforma- 
tion of the kinase. In the receiver subunit but not the activator, outward 
displacement of the C-helix is impeded by the asymmetric dimer inter- 
action. Therefore, we hypothesized that EAI045 was a potent inhib- 
itor of the activator subunit of the mutant receptor, but a much less 
potent inhibitor of the receiver subunit, in which the C-helix is captive. 
Because the mutant receptor favours dimer formation?*”®, this effect 
could explain both the incomplete inhibition of EGFR autophospho- 
rylation and the apparent disconnect in the biochemical and cellular 
potencies of the allosteric inhibitor. To test this notion, we exploited an 
1941R point mutation in the C-lobe of the kinase, which is known to 
block the asymmetric dimer interaction®”®. The activity of the L858R/ 
T790M mutant is dimerization-independent”® and, as expected, trans- 
duction of Ba/F3 cells with EGFR(L858R/T790M/1941R) led to fac- 
tor-independent proliferation. In support of our hypothesis, Ba/F3 
cells bearing this dimerization-defective mutant were markedly more 
sensitive to the allosteric inhibitor (Fig. 2d). 

The therapeutic antibody cetuximab targets the extracellular portion 
of the EGF receptor, blocking ligand binding and preventing dimer 
formation®!°. The antibody is not effective clinically in EGFR-mutant 
NSCLC, and in cell-based studies cetuximab alone does not inhibit 
L858R/T790M or exon19del/T790M mutant EGFR, because their 
activity is independent of dimerization”®. However, we reasoned that 
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Figure 2 | Cellular activity and mechanism of synergy of EAI045 with 
cetuximab. a, Analysis of EAI045 inhibition of EGFR phosphorylation 

in H1975 cells by western blotting (anti-pY1068). A dose response study 
is shown at 3h after compound addition for EAI045 and the irreversible 
quinazoline inhibitor afatinib (control). For gel source data, see 
Supplementary Fig. 1. b, The effect of EAI045 on EGFR target modulation 
in H1975 cells in the presence and absence of EGE. EGFR phosphorylation 
(pY1173) was measured using an ELISA-based assay; error bars indicate 
s.d. (n= 3). ¢, The allosteric pocket is differentially accessible in the two 
subunits of the asymmetric dimer. Unlike wild-type EGFR in which only 
the receiver subunit is active, both subunits are catalytically active in the 
L858R/T790M mutant. The activator subunit is more readily inhibited 

by allosteric agents (yellow star), because the C-helix can be readily 
displaced. By contrast, opening the allosteric pocket in the receiver 
subunit requires perturbing the dimer. Thus mutations that disrupt the 
asymmetric dimer (such as 1941R, blue circle) or antibodies that block 
dimerization (cetuximab) should enhance the potency of allosteric agents. 
d, Inhibition of proliferation of Ba/F3 cells expressing L858R/T790M 

and L858R/T790M/1941R by EAI045. Addition of the dimer-disrupting 
1941R mutation markedly increased inhibition by EAI045. e, f, Treatment 
of EGFR-mutant Ba/F3 cells with EAI045 alone, in combination with 
cetuximab (101g ml~'), or with cetuximab alone. Note the pronounced 
synergy with cetuximab that is observed only in the L858R/T790M model. 
The mean + s.d. (n =6) is plotted for each drug and concentration (d-f). 


cetuximab should synergize with a kinase-targeted allosteric inhibitor, 
by converting the inhibitor-resistant receiver population into a mon- 
omeric form that is remarkably sensitive to EAI045. Notably, in the 
presence of cetuximab (101g ml~!), EAI045 inhibited proliferation 
of EGFR(L858R/T790M) Ba/F3 cells with an ICso of approximately 
10nM, similar to its potency against this mutant in biochemical assays 
(Fig. 2e). In support of an on-target, mutant-selective effect of the 
allosteric agent, proliferation of Ba/F3 cells bearing EGFR(exon19del/ 
T790M) was not inhibited by this combination (Fig. 2f). 

We next tested the in vivo efficacy of EAI045 in genetically 
engineered mouse model of L858R/T790M-mutant-driven lung 
cancer?’, both alone and in combination with cetuximab. Mouse 
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Figure 3 | EAI045 in combination with cetuximab induces tumour 
regression in genetically engineered mouse models of EGFR-mutant 
lung cancer. a, Mice bearing L858R/T790M mutant tumours were treated 
with EAI045 alone (mn =5), cetuximab alone (n = 3) or both agents in 
combination (m = 10). Tumour volumes were measured using MRI 4 weeks 
after initiation of treatment and are plotted for each animal in a ‘waterfall’ 
format. b, As in a, but in mice bearing exon19del/T790M mutant tumours 
(n=4, 4 and 4). c, As ina, but in mice bearing L858R/T790M/C797S 
mutant tumours (n = 3, 4, and 5). d, e, Pharmacodynamic studies in 
exon19del/T790M and L858R/T790M/C797S mice. Tumour nodules from 
mice treated with EAI045 or cetuximab alone or with the combination 
(combo.) were analysed by western blotting with the indicated 

antibodies to examine the effect of treatment on EGFR signalling. 
Multiple independent mouse tumours were obtained and analysed, two 
independent and representative samples are shown. For gel source data, 
see Supplementary Fig. 1. Source data for tumour volume measurements 
are provided in Supplementary Fig. 2. 


pharmacokinetic studies with EAI045 revealed a maximal plasma 
concentration of 0.57 1M, a half-life of 2.15h, and oral bioavailability 
of 26% after dosing at 20mg kg~'. In a 4-week efficacy study, mice were 
treated with EAI045 at 60 mg kg~' by oral gavage once daily, either 
alone or together with cetuximab (1 mg intraperitoneally every other 
day). We observed marked tumour regressions in the L858R/T790M- 
mutant mice treated with the combination, whereas those treated with 
EAI045 alone did not respond (Fig. 3a). Cetuximab alone had a very 
modest effect in these mice, as previously observed”®. Mice bearing 
EGFR(exon19del/T790M) were treated using the same protocol, but 
as expected failed to respond to the combination therapy (Fig. 3b). 
Magnetic resonance imaging (MRI) studies of cohorts of L858R/ 
T790M and exon19del/T790M mice after combination treatment for 
1 or 2 weeks are shown in Extended Data Fig. 5. 

Mutation of C797 is expected to confer resistance to all third-generation 
irreversible EGFR inhibitors that are active on the T790M-mutant 
EGFR, and a preliminary study reported the C797S alteration in 15 
out of 67 patients (22%) with acquired resistance to AZD9291 (ref. 28). 
Mutations in C797 should not affect the efficacy of EAI045, as this 
residue is remote from the allosteric binding pocket. Consistent with 
this expectation, EAI045 in combination with cetuximab potently 
inhibited L858R/T790M/C797S Ba/F3 cells (Extended Data Fig. 5a) 
and treatment of genetically engineered L858R/T790M/C797S mice 
with EAI045 and cetuximab induced marked tumour shrinkage, similar 
to that observed in the L858R/T790M models (Fig. 3c, Extended Data 
Fig. 5b). Pharmacodynamic studies performed following two doses 
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of treatment demonstrated that EAI045 in combination with cetuxi- 
mab effectively inhibited phosphorylation of EGFR and downstream 
signalling proteins in these mice, but not in mice bearing the insensitive 
exon19del/T790M mutation (Fig. 3d, e). 

The compounds we describe here are among the first allosteric TKIs, 
and to our knowledge, the first targeting any receptor tyrosine kinase in 
a mutant-selective manner. Further study is required, but our findings 
suggest that EAI045 or a related compound in combination with an 
EGFR dimer-disrupting antibody such as cetuximab would be an effec- 
tive strategy for treating L858R/T790M-mutant-driven lung cancers, 
as well as those driven by the triple L858R/T790M/C797S mutation, 
which are resistant to all current EGFR-targeted therapies. EAI045 
and cetuximab exhibit mechanistic synergy, a valuable property for 
combination agents because it lowers the dose required for efficacy. 
Ideally, chemotherapeutic agents used in combination should also have 
non-overlapping mechanisms of toxicity and sensitivity to resistance 
mutations. EAI045 meets these criteria as well; its lack of activity on 
wild-type EGFR and other kinases suggest that its dose-limiting toxic- 
ity is unlikely to be related to that of cetuximab and ATP-competitive 
EGER inhibitors. In addition, given its distinct binding site, its sensitiv- 
ity to resistance-conferring mutations is expected to be divergent from 
that of both cetuximab and ATP-site inhibitors. For these reasons, we 
speculate that an allosteric agent like EAI045 could be used in combi- 
nation with ATP-site-directed inhibitors, with the goal of preventing 
the emergence of treatment-associated resistance mutations in the 
receptor itself. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

EGFR protein expression and purification. Constructs spanning residues 
696-1022 of the human EGFR (including wild type, L858R, L858R/T790M, 
T790M, and T790M/V948R mutant sequences) were prepared in a GST-fusion 
format using the pTriEX system (Novagen) for expression in Sf9 insect cells essen- 
tially as described!*”?. EGFR kinase proteins were purified by glutathione-affinity 
chromatography followed by size-exclusion chromatography after cleavage with 
Tomato etch virus (TEV) or thrombin to remove the GST fusion partner following 
established procedures!*3, 

High-throughput screening. Purified EGFR(L858R/T790M) enzyme was 
screened against Novartis compound collection of ~2.5 million using homoge- 
neous time-resolved fluorescence (HTRF)-based biochemical assay format. The 
screening was performed at 1|1M ATP using a single compound concentration 
(12.5 1M). 1,322 top hits were picked for follow-up ICs9 confirmation. ICs9 values 
were determined at both 11M and 1mM ATP to identify both ATP competitive 
and non-competitive compounds. Hits were also counter-screened against 
wild-type EGER to evaluate the mutant selectivity. 

HTREF-based EGER biochemical assays. Biochemical assays for wild-type EGFR 
and each mutant were carried out using a HTRF assay as described previously”. 
Assays were optimized for each ATP concentration. Compound ICs» values were 
determined by 12-point inhibition curves (from 50 to 0.000282 1M) in duplicate. 
Structure determination. Before crystallization, 0.1 mM of EGFR(T790M/V948R) 
was incubated for 1h with 0.5mM EAI001, 1mM adenosine 5/-(3,7-imido) 
triphosphate (AMP-PNP) and 10mM MgCl, at room temperature. Crystals of 
EGFR(T790M/V948R) in complex with EAI001 were prepared by hanging-drop 
vapour diffusion method over a reservoir solution containing 0.1 M Bis-Tris 
(pH 5.5), 25% PEG 3350, 5 mM tris (2-carboxyethyl)-phosphine (TCEP). Crystals 
were flash-frozen in liquid nitrogen after rapid immersion in a cryoprotectant 
solution containing 0.1M Bis-Tris 5.5, 25% PEG3350, 10% ethylene glycol and 
5mM TCEP. Diffraction data were recorded using a Mar343 image plate detector 
on a rotating anode source at 100 K. Data were processed and merged as described 
previously’. The structure was determined by molecular replacement with the 
program PHASER using an inactive EGFR kinase structure (PDB, 2GS7) as the 
search model. Repeated rounds of manual refitting and crystallographic refinement 
were performed using COOT and REFMAC. The inhibitor was modelled into 
the closely fitting positive F, — F, electron density and then included in following 
refinement cycles. Although the EAI001 preparation used in crystallization was 
racemic, the density clearly corresponded to the R stereoisomer and was modelled 
accordingly. Topology and parameter files for the inhibitors were generated using 
PRODRG. Statistics for diffraction data processing and structure refinement are 
shown in Extended Data Table 3. 

Tissue Culture. Cells were maintained in 10% FBS/RPMI supplemented with 
100 jg ml“! penicillin/streptomycin (Hyclone $H30236.01). The cells were col- 
lected with 0.25% trypsin/EDTA (Hyclone SH30042.1), re-suspended in 5% FBS/ 
RPMI penicillin/streptomycin and plated at 7,500 cells per well in 50 1l of media 
in a 384-well black plate with clear bottoms (Greiner 789068G). The cells were 
allowed to incubate overnight in a 37°C, 5% CO humidified tissue culture incu- 
bator. The 12-point serial diluted test compounds were transferred to the plate 
containing cells by using a 50 nl Pin Head device (Perkin Elmer) and the cells were 
placed back in the incubator for 3h. All cell lines were tested and found negative 
for mycoplasma contamination using the MycoAlert Mycoplasma Detection Kit 
(Lonza). 

Phospho-EGFR (Y1173) target modulation assay. HaCaT cells were stimulated 
with 10ng ml! EGF (Peprotech AF-100-15) for 5 min at room temperature. 
Constitutively activated EGFR mutant cell lines (H1975 and H3255) were not stim- 
ulated with EGE. The media was reduced to 20 11 using a Bio-Tek ELx405 Select 
plate washer. Cells were lysed with 20 1L of 2x lysis buffer containing protease and 
phosphatase inhibitors (2% Triton X-100, 40mM Tris (pH 7.5), 2mM EDTA, 2mM 
EGTA, 300 mM NaCl, 2x complete cocktail inhibitor (Roche 11 697 498 001), 
2x phosphatase inhibitor cocktail set II and set III (Sigma P5726 and P0044)). The 
plates were shaken for 20 min. An aliquot of 25 1] from each well was transferred 
to prepared ELISA plates for analysis. 

For the experiment studying the effect of EGF pre-treatment on EAI045 target 
modulation, H1975 cells were collected and plated in 0.5% FBS/RPMI penicillin/ 
streptomycin. On the following day, cells were pre-treated with 0.5% FBS/RPMI 
media with or without 10ng EGF per ml for 5 min. Compound was added and 
assay was carried out as described above. The experiment was performed twice 
with duplicate samples in each experiment. 

Phospho-EGFR (Y1173) ELISA. Solid white 384-well high-binding ELISA plates 
(Greiner 781074) were coated with 5j1g ml‘ goat anti-EGFR capture antibody 
overnight in 50 mM carbonate/bicarbonate (pH 9.5) buffer. Plates were blocked 
with 1% BSA (Sigma A7030) in PBS for 1h at room temperature, and washes were 
carried out with a Bio-Tek ELx405 Select using four cycles of 10011 TBS-Tween 
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(20 mM Tris, 137 mM NaCl, 0.05% Tween-20) per well. A 25,11 aliquot of lysed 
cell was added to each well of the ELISA plate and incubated overnight at 4°C 
with gentle shaking. After washing, 1:1,000 anti-phospho-EGFR in 0.2% BSA/ 
TBS-Tween was added and incubated for 2h at room temperature. After washing, 
1:2,000 anti-rabbit-HRP (horseradish peroxidase) in 0.2% BSA/TBS-Tween was 
added and incubated for 1h at room temperature. Chemiluminescent detection 
was carried out with SuperSignal ELISA Pico substrate. Luminescence was read 
with an EnVision plate reader. 

Western blotting. Cell lysates were equalized to protein content determined by 
Coomassie Plus protein assay reagent (ThermoScientific 1856210) and loaded 
onto 4-12% NuPAGE Bis-Tris gels with MOPS running buffer with LDS Sample 
buffer supplemented with DTT. Gel proteins were transferred to PVDF mem- 
branes with an iBlot Gel Transfer Device. 1 x Casein-blocked membranes were 
probed with primary antibodies overnight at 4°C on an end-over-end rotisserie. 
Membranes were washed with TBS-Tween and HRP-conjugated secondary anti- 
bodies were added for 1h at room temperature. After washing, HRP was detected 
using Luminata Forte Western HRP Substrate reagent and recorded with a Bio-Rad 
VersaDoc imager. 

H1975, H3255 and HaCaT proliferation assays. H1975, H3255 and HaCaT cell 
lines were plated in solid white 384-well plates (Greiner) at 500 cells per well in 10% 
FBS RPMI penicillin/streptomycin media. Using a Pin Tool, 50 nl of serial diluted 
compounds were transferred to the cells. After 3 days, cell viability was measured 
by CellTiter-Glo (Promega) according to manufacturer's instructions. Luminescent 
readout was normalized to 0.1% DMSO-treated cells and empty wells. Data was 
analysed by nonlinear regression curve fitting and ECs9 values were reported. 
Ba/F3 cell proliferation models. The EGFR mutant L858R, L858R/T790M, 
delE746_A750/T790M, L858R/T790M/C797S and del/T790M/C797S Ba/F3 cells 
have been previously described!°. The EGFR(I941R) mutation was introduced 
via site directed mutagenesis using the Quick Change Site-Directed Mutagenesis 
kit (Stratagene) according to the manufacturer's instructions. All constructs were 
confirmed by DNA sequencing. The constructs were shuttled into the retroviral 
vector JP1540 using the BD Creator System (BD Biosciences). Ba/F3 cells were 
infected with retrovirus and according to standard protocols, as described 
previously*°. Stable clones were obtained by selection in puromycin (21g ml~'). 
Ba/F3 cells have not been authenticated as there is no publicly available fingerprint 
for Ba/F3 cells. All variants used were confirmed to contain the correct EGFR 
mutation by sequencing. All Ba/F3 cells were tested for mycoplasma contamination 
and confirmed to be free of contamination. 

Growth and inhibition of growth was assessed by MTS assay and was per- 
formed according to previously established methods". Ba/F3 cells of different 
EGFR genotypes were exposed to treatment for 72 h and the number of cells used 
per experiment determined empirically and has been previously established!°. All 
experimental points were set up in six wells and all experiments were repeated at 
least three times. The data was graphically displayed using GraphPad Prism version 
5.0 for Windows, (GraphPad software; http://www.graphpad.com). The curves 
were fitted using a nonlinear regression model with a sigmoidal dose response. 
NIH-3T3 cell studies. NIH-3T3 cells were infected with retroviral constructs 
expressing EGFR mutants according to standard protocols, as described 
previously'>!°. Stable clones were obtained by selection in puromycin (21g ml~!). 
Mouse efficacy studies. EGFR(TL) (bearing L858R/T790M point mutations) 
and EGFR(TD) (bearing exon19del/T790M point mutations) mice were gen- 
erated as previously described!>””. The EGFR(L858R/T790M/C797S) (denoted 
as TLCS hereafter) mutant mouse cohort was established briefly as follows: the 
full-length human TLCS cDNA was generated by site-directed mutagenesis 
using the Quickchange site directed mutagenesis kit (Agilent Technologies) and 
further verified by DNA sequencing. Sequence-verified targeting vectors were 
co-electroporated with an FLPe recombinase plasmid into v6.5 C57BL/6J 
(female) x 129/sv (male) embryonic stem cells (Open Biosystems) as described 
elsewhere*!. Resulting hygromycin-resistant embryonic stem clones were evaluated 
for transgene integration via PCR. Then, transgene-positive embryonic stem clones 
were injected into C57BL/6 blastocysts, and the resulting chimaeras were mated 
with BALB/c wild type mice to determine germline transmission of the TLCS 
transgene. Further detail on the generation and characterization of the TLCS trans- 
genic mice is provided in Supplementary Fig. 3. Progeny of TL, TD and TLCS mice 
were genotyped by PCR of tail DNA. The TL and TD mice were fed a doxycycline 
diet at 6 weeks of age to induce EGFR(TL) or EGFR(TD) expression, respectively. 
The TLCS mice were intranasally instilled with Ad-Cre (University of Iowa viral 
vector core) at 6 weeks of age to excise the loxP sites, activating EGFR(TLCS) 
expression. 

The EAI045 compound was dissolved in 10% NMP (10% 1-methyl-2-pyrrolidinone: 
90% PEG-300), and was dosed at 60 mg kg"! daily by oral gavage. Cetuximab 
was administrated at 1 mg mouse’! every other day by intraperitoneal injection. 
The TL, TD and TLCS mice were monitored by MRI to quantify lung tumour 
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burden before being assigned to various study treatment cohorts, which were 
non-blinded and not formally randomized. All treated mice had an equal initial 
tumour burden. MRI evaluation was repeated every 2 weeks during treatment. 
The animals were imaged with a rapid acquisition with relaxation enhancement 
sequence (repetition time = 2000 ms; echo time = 25 ms) in the coronal and axial 
planes with a 1-mm slice thickness and with respiratory gating. The detailed pro- 
cedure for MRI scanning has been previously described’’. The tumour burden vol- 
umes were quantified using 3-dimensional Slicer software. Source data for tumour 
volume measurements are provided in Supplementary Fig. 2. 

All care of experimental animals was in accordance with Harvard Medical 
School/Dana-Farber Cancer Institute institutional animal care and use committee 
(IACUC) guidelines. All mice were housed in a pathogen-free environment at a 
DFCI animal facility and handled in strict accordance with Good Animal Practice 
as defined by the Office of Laboratory Animal Welfare. None of the tumour efficacy 
experiments presented in this manuscript exceeded the 2cm maximal diameter 
tumour size, as permitted by the Dana-Farber Cancer Institute LACUC. 
Synthesis and characterization of EAI045. 2-(5-fluoro-2-hydroxyphenyl)-2- 
(1-oxo-2,3-dihydro-1H-isoindol-2-yl)-N-(1,3-thiazol-2-yl)acetamide (EAI045) 


was prepared from 2-amino-2-(5-fluoro-2-methoxyphenyl)acetic acid using a 
reaction sequence similar to that previously described*” followed by demethyl- 
ation with boron tribromide. 

1H NMR (400 MHz, DMSO-d6) 6 12.61 (s, 1H), 9.96 (s, 1H), 7.73 (d, J=7.5 Hz, 
1H), 7.66-7.54 (m, 2H), 7.52 (dd, J= 1.0, 7.4Hz, 1H), 7.49 (d, J=3.6 Hz, 1H), 7.27 
(d, J=3.5 Hz, 1H), 7.11 (td, J=3.2, 8.6 Hz, 1H), 6.90 (dd, J=4.8, 8.9 Hz, 1H), 6.85 
(dd, J=3.1, 9.2 Hz, 1H), 6.31 (s, 1H), 4.61 (d, J=17.5 Hz, 1H), 3.98 (d, J=17.5 Hz, 
1H); °F NMR (376 MHz, DMSO-d6) 5 —125.15 (s, 1F); LCMS: Rt 1.278 min; ESMS 
mlz 384.20 (M+H*). 
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Extended Data Figure 1 | Inhibition of wild-type and mutant EGFR 
kinases by EAI001 and EAI045 in purified enzyme assays. a, Inhibition 
of wild-type and mutant EGFR kinases by EAI001. Activity of the 
indicated mutant EGER kinase (residues 696-1022) was measured in the 
presence of increasing concentrations of EAI001. The HTRF assay was 
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carried out using either 141M ATP (left) or 1mM ATP (right). b, Inhibition 
of EGFR(L858R/T790M) by EAI045 (left) or erlotinib (right) at a range of 
ATP concentrations, as indicated. Assay was performed using an HTRF- 
based assay as described in the Methods. Error bars indicate s.d. (n =2). 
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Extended Data Figure 2 | Comparison of the binding site of EGFR 
allosteric inhibitors with those of lapatinib and allosteric MEK 
inhibitors. a, Structure of EAI001 in complex with EGFR for comparison. 
b, Structure of MEK1 kinase bound to allosteric inhibitor GDC0973 (PDB, 
4AN2). GDC0973 (also called XL518, cobimetinib) and other allosteric 
MEK inhibitors occupy a pocket created by displacement of the C-helix in 
the inactive conformation of the kinase. Most allosteric MEK inhibitors 
make hydrogen-bond interactions with the y-phosphate group of ATP that 
are important for their potency. The allosteric EGFR inhibitors we describe 
here bind in a generally analogous location in EGFR, but lack any clear 
structural similarity to MEK inhibitors and do not contact the \-phosphate 
group of ATP. c, The structure of lapatinib bound to EGFR (PDB, 1XKK). 
Both lapatinib and neratinib (see Fig. 1d) bind an inactive conformation 

of the kinase. Like gefitinib and erlotinib, both occupy the ATP site, but 
also extend into the allosteric pocket occupied by EAI001. Note that like 
neratinib, lapatinib places aromatic phenyl or pyridinyl groups in positions 
similar to those occupied by the aminothiazole and phenyl substituents of 
EAIOO1. 
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Extended Data Figure 3 | EAI001 binding is incompatible with the 
inactive conformation of wild-type EGFR. Superposition of the EAI001- 
bound EGFR structure reported here with the structure of wild-type EGFR 
kinase in the inactive conformation (grey, PDB, 2GS7). EAI001 (shown 
with carbon atoms in green) clashes with the side chains of leucines 

858 and 861 in the wild-type EGFR structure. These leucine residues 

lie in a short helical segment at the N terminus of the activation loop. 

The L858R substitution disrupts this helix. We propose that this effect 
explains, in part, the selectivity of the allosteric inhibitor for the L858R/ 
T790M mutant. Note that EAI001 was crystallized with the EGFR(T790M/ 
V948R), as we were unable to obtain crystals with the L858R/T790M or 
L858R/T790M/V948R proteins. The compound induces unstructuring of 
the activation loop helix and repositions L858, which is in contact with the 
1-oxoisoindolinyl group of the inhibitor. The location and conformation 
of the inhibitor is expected to be the same in the context of the L858R 
mutation, but the details of the interaction with this portion of the 
activation loop will necessarily differ due to the mutation. 
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Extended Data Figure 4 | Cellular activity of EAI045. a, EAI045 
inhibition of EGFR(L858R/T790M) in NIH-3T3 cells. Western blotting 
with the indicated concentrations of the allosteric inhibitor or with 11M 


W2Z4002 as control (WZ) was carried out 6h after compound addition. 
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b-e, Profiling of EAI045 in Ba/F3 models bearing mutant EGFR or the 
parental Ba/F3 cell line, as indicted. Inhibition by WZ4002 is shown as a 
positive control. For gel source data, see Supplementary Fig. 1. 
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Extended Data Figure 5 | Cellular and in vivo efficacy of EAI045 in 
combination with cetuximab. a, Ba/F3 cells bearing EGFR(L858R/ 
T790M/C797S) were treated with EAI045 alone or with EAI045 plus 
cetuximab and proliferation was measured using the MTS assay after 
72h. b, MRI imaging of cohorts L858R/T790M, exon19del/T790M, and 
L858R/T790M/C797S genetically engineered EGFR-mutant mice before 
treatment and 1 or 2 weeks after treatment with EAI045 and cetuximab. 
These cohorts of tumour bearing mice were used for short term efficacy 
and pharmacodynamic studies, and are distinct from those used for the 
tumour volume measurements shown in Fig. 3. 
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Extended Data Table 1 | The selectivity of EAI045 on a panel of kinases 


Kinase Pen Kinase eo Kinase ates Kinase ea Kinase yen Kinase eee 
inhibition inhibition inhibition inhibition inhibition inhibition 
ABL1 1 CK1-EPSILON 0 FMS 0 MAP4K5 13 PAK4 0 ROCK1 0 
AKT1 0 CK1-GAMMA1 2 FRAP1 0 MAPK1 0 PAKS 0 ROCK2 0 
AKT2 0 CK1-GAMMA2 0 FYN 0 MAPK3 0 PAK6 0 RON 0 
AKT3 0 CK1-GAMMA3 6 GRK3 0 MAPKAPK2 0 PASK 1 ROS 1 
ALK2 0 CK2 0 GRK5 0 MAPKAPK3 0 PDGFR-ALPHA 0 RSK1 0 
ALKS 0 CLK1 0 GRK6 0 MARK1 0 PDGFR-BETA 4 RSK2 0 
ALK6 0 CLK2 1 GRK7 0 MARK3, 0 PDK1 0 RSK3 0 
AMP-A1B1G1 1 CLK3 0 GSK-3-ALPHA 6 MARK4 0 PHK-GAMMA‘1 0 RSK4 0 
AMP-A2B1G1 0 CLK4 0 GSK-3-BETA 0 MEK1 0 PHK-GAMMA2 0 SGK1 6 
ARG 2 CRAF 0 HASPIN ‘l MEK2 0 PI3K-ALPHA 0 SGK2 0 
ARKS 0 CSK 0 HCK 0 MELK 0 PI3K-BETA 0 SGK3 0 
AURORA-A 0 DAPK1 0 HIPK1 ‘l MER 8 PI3K-DELTA 0 SIK 0 
AURORA-B 4 DAPK3 1 HIPK2 2 MET 0 PI3K-GAMMA 0 SLK 0 
AURORA-C. 0 DCAMKL2 0 HIPK3 0 MKNK1 0 PI4-K-BETA 9 SNF1LK2 0 
AXL 4 DDR2 4 HIPK4 0 MNK2 3 PIM1 0 SPHK1 0 
BLK 3 DYRK1A 0 IGFAR 4 MRCK-ALPHA 0 PIM2 2 SPHK2 0 
BMX 1 DYRK1B 2 IKK-ALPHA 16 MRCK-BETA 0 PIM3 il SRC 0 
BRAF 0 DYRK3 2 IKK-BETA 0 MSK1 0 PKA 3 SRMS 0 
BRK 0 DYRK4 1 IKK-EPSILON 0 MSK2 2 PKACB 0 SRPK1 0 
BRSK1 0 EGFR 7 INSR 0) MSSK1 0 PKC-ALPHA 0 SRPK2 10 
BRSK2 0 EPH-A1 0 IRAK1 4 MST1 0 PKC-EPSILON 14 STK16 0 
BTK 0 EPH-A2 7 IRAK4 0 MST2 0 PKC-ETA 0 SYK 0 
CAMK1A 0 EPH-A3 1 IRR 0 MST3 0 PKC-GAMMA 0 TAK1-TAB1 0 
CAMK1D 0 EPH-A4 0 ITK 0 MST4 0 PKC-IOTA 0 TAOK2 0 
CAMK2A 2 EPH-A5 4 JAK1 0 MUSK 0 PKC-THETA 0 TAOK3 0 
CAMK2B 0 EPH-A8 0 JAK2 2 NDR1 2 PKC-ZETA 0 TBK1 3 
CAMK2D 4 EPH-B1 0 JAK3 0 NDR2 0 PKN1 0 TEC 4 
CAMK2G 9 EPH-B2 0 JNK1 0 NEK1 0 PKN2 1 TIE2 0 
CAMK4 1 EPH-B3 0 JNK2 0 NEK2 2 PLK1 Te TNIK 0 
CDK1 2 EPH-B4 0 JNK3 0 NEK3 0 PLK3 5 TNK1 0 
CDK2-CYCLINA 1 ERB-B2 4 KDR 0 NEK6 0 PLK4 0 TNK2 2 
CDK2-CYCLINE 0 ERB-B4 16 KIT 0 NEK7 0 PRAK 0 TRKA 0 
CDK3-CYCLINE 0 FER 4 LATS2 2 NEK9 0 PRKD1 0 TRKB 3 
CDK4-CYCLIND 0 FES 2 LCK 0 P38-ALPHA 0 PRKD2 2 TRKC 4 
CDK5 0 FGFR1 5 LIMK1 0 P38-BETA 0 PRKD3 0 TSSK1 2 
CDK5-P25 0 FGFR2 1 LOK 1 P38-DELTA 0 PRKG1 0 TSSK2 0 
CDK6-CYCLIND3 2 FGFR3 1 LRRK2-G2019S 0 P38-GAMMA 2 PRKG2 8 TTK 0 
CDK7 0 FGFR4 0 LTK ‘l P70S6K1 0 PRKX 3 TXK 0 
CDK9-CYCLINT1 6 FGR 0 LYNA 9 P70S6K2 0 PTK5 0 TYK2 0 
CHEK1 0 FLT-1 8 LYNB 3 PAK1 0 PYK2 4 TYRO3 0 
CHEK2 0 FLT-3 0 MAP4K2 0 PAK2 0 RET 0 YES 0 
CK1 0 FLT-4 1 MAP4Kk4 2 PAK3 0 RIPK2 0 ZAP70 1 


*Percent inhibition was measured in the presence of 11m EAIO45. Experiment was performed once with duplicate samples. 
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Extended Data Table 2 | Selectivity of EAI045 against a panel of non-kinase targets 


Assay Name IC50 (uM) 

Adenosine 2A receptor binding assay >30 
Adenosine 3 receptor binding assay >30 
Adrenergic Alpha 2C receptor assay >30 
Alphai1A adrenergic calcium flux assay (agonist mode) >30 
Alpha1A adrenergic calcium flux assay (antagonist mode) >30 
Beta 1 adrenergic receptor assay >30 
COX-1 assay >30 
CYP3A4 Induction Reporter Gene >10 
Dopamine D2 receptor assay >30 
Dopamine Transporter assay >30 
H1 receptor calcium assay (agonist mode) >30 
H1 receptor calcium assay (antagonist mode) >30 
Histamine H1 receptor assay >30 
Melanocortin MC3 receptor binding assay >30 
Monoamine Oxidase A assay >30 
Muscarinic M1 receptor assay >30 
Muscarinic M2 calcium flux assay with ATP priming (agonist mode) >30 
Nicotinic (CNS) Receptor binding (human IMR32 cells) >30 
Norepinephrine Transporter assay >30 
PPARgamma Receptor agonist assay >30 
PPARgamma Receptor antagonist assay >30 
PXR Receptor agonist assay 16 
PXR Receptor antagonist assay >3 
Phosphodiesterase 3 assay (human platelets) >30 
Phosphodiesterase 4D assay >30 
Pregnane X Receptor (PXR; SXR) binding assay 7 

Progesterone Receptor agonist assay >30 
Progesterone Receptor antagonist assay >30 
Serotonin 5HT2A calcium flux assay (agonist mode) >30 


Serotonin 5HT2A calcium flux assay (antagonist mode) >30 
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Extended Data Table 3 | Crystallographic data collection and refinement statistics 


Crystal name 


EGFR T790M/V948R EAIO01 


Data collection 
Space group 
Cell dimensions 

a, b, c (A) 

a, B, y (°) 
Resolution (A) 
Rmerge 
I/o 
Completeness (%) 
Redundancy 


Refinement 

Resolution (A) 

No. Reflections 

Rwork/ Rrree 

No. Atoms 
Protein 
Ligand/ion (AMPPNP/EAI001/Mg7") 
Water 


B-factors 
Protein 
Ligand/ion (AMPPNP/EAI001/Mg~") 
Water 
R.m.s deviations 
Bond lengths (A) 
Bond angles (°) 


C2 


155.1, 72.5, 76.0 
90, 113.2, 90 
42.24 - 2.31 (2.39)* 
0.11 (0.51) 
10.4 (1.9) 

97.8 (94.0) 
3.4 (2.9) 


42.24 - 2.31 
33377 
0.174/0.206 


4826 
62/25/2 
398 


33.70 
25.90 
36.60 


0.008 
1.136 


Diffraction data were recorded from a single crystal. 
*Values in parentheses are for highest resolution shell. 
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Extended Data Table 4 | Cellular activity of EAI045 in lung cancer cell lines 
EAI045 ECso (uM) 


Cell line (EGFR) Target modulation* Proliferationt 
H1975 (L858R/T790M) 0.002 (4) >10 (2) 
H3255 (L858R) 0.163 (4) >10 (2) 
HaCaT (WT) >10 (2) >10 (1) 


*ELISA-based assay for phosphorylation of EGFR Y1173; the number of times each experiment was repeated is given in parentheses. 
+The number of times each experiment was repeated is given in parentheses. 
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Diverse roles of assembly factors revealed by 
structures of late nuclear pre-60S ribosomes 


Shan Wu", Beril Tutuncuoglu’, Kaige Yan', Hailey Brown’, Yixiao Zhang’, Dan Tan**, Michael Gamalinda’, Yi Yuan’, Zhifei Li’, 
Jelena Jakovljevic*, Chengying Mal, Jianlin Lei!, Meng-Qiu Dong*’, John L. Woolford Jr? & Ning Gao! 


Ribosome biogenesis is a highly complex process in eukaryotes, 
involving temporally and spatially regulated ribosomal protein 
(r-protein) binding and ribosomal RNA remodelling events in the 
nucleolus, nucleoplasm and cytoplasm!?. Hundreds of assembly 
factors, organized into sequential functional groups*’, facilitate 
and guide the maturation process into productive assembly 
branches in and across different cellular compartments. However, 
the precise mechanisms by which these assembly factors function 
are largely unknown. Here we use cryo-electron microscopy to 
characterize the structures of yeast nucleoplasmic pre-60S particles 
affinity-purified using the epitope-tagged assembly factor Nog2. 
Our data pinpoint the locations and determine the structures of 
over 20 assembly factors, which are enriched in two areas: an arc 
region extending from the central protuberance to the polypeptide 
tunnel exit, and the domain including the internal transcribed 
spacer 2 (ITS2) that separates 5.8S and 25S ribosomal RNAs. In 
particular, two regulatory GTPases, Nog2 and Nog], act as hub 
proteins to interact with multiple, distant assembly factors and 
functional ribosomal RNA elements, manifesting their critical 
roles in structural remodelling checkpoints and nuclear export. 
Moreover, our snapshots of compositionally and structurally 
different pre-60S intermediates provide essential mechanistic 
details for three major remodelling events before nuclear export: 
rotation of the 5S ribonucleoprotein, construction of the active 
centre and ITS2 removal. The rich structural information in our 
structures provides a framework to dissect molecular roles of 
diverse assembly factors in eukaryotic ribosome assembly. 
Assembly of pre-60S ribosomes occurs in consecutive stages, orches- 
trated by coordinated groups of assembly factors. The presence or 
absence of three mostly non-overlapping factors in pre-60S particles, 
Nsal, Nog2 and Nmd3, defines a continuous transition from the nucle- 
olus through the nucleoplasm to final stages licensing nuclear export 
(Extended Data Fig. 1). Nog2, an essential GTPase”, enters pre-60S 
particles in the nucleolus, and is present during most nucleoplasmic 
stages. The lifetime of Nog2 coincides with three important pre-60S 
remodelling and processing events: rotation of the 5S ribonucleopro- 
tein (RNP)®, construction of the active site and cleavage of ITS2 (ref. 5), 
as well as the temporally regulated binding and release of assembly 
factors’. Nog2 departure constitutes a critical checkpoint for nuclear 
export of pre-608S particles®. The remodelling ATPase Real is thought 
to catalyse conformational changes in late nucleoplasmic particles that 
stimulate the GTPase activity of Nog2 (ref. 8). This enables release of 
Nog2 and replacement by the key export factor Nmd3, whose binding 
site overlaps with that of Nog2 (refs 8, 9). This model of nucleoplas- 
mic pre-60S maturation was established largely from biochemical and 
genetic experiments (reviewed in ref. 1). Low-resolution cryo-elec- 
tron microscopy (cryo-EM) maps have revealed the location of sev- 
eral different assembly factors®!°-4, but atomic contacts with the 60S 


subunit are only known for Tif6 (ref. 15), Arx1, Alb1 and Reil (ref. 16). 
Nevertheless, spatial relationships among most of the assembly factors 
in pre-60S particles remain unclear. In particular, key assembly events 
responsible for activating successive maturation checkpoints are yet 
to be determined. 

To further explore the mechanism of late nuclear steps in pre-60S 
assembly, we characterized structures of native nucleoplasmic par- 
ticles isolated from Saccharomyces cerevisiae, using epitope-tagged 
assembly factor Nog2 (Extended Data Fig. 1a). These Nog2-particles 
were subjected to cryo-electron microscopy (cryo-EM) (Extended 
Data Table 1) to determine a series of structures (hereafter termed 
states 1-3) (Extended Data Fig. 2), presumably reflecting temporally 
related snapshots of final maturation steps of pre-60S particles before 
nuclear export. One of these structures, state 1, was solved at a nomi- 
nal resolution of 3.08 A (Extended Data Fig. 3b, c and Supplementary 
Video 1). We identified over 30 assembly factors in Nog2-particles 
(Extended Data Fig. 1b). Guided by chemical cross-linking of pro- 
teins coupled with mass spectrometry (XL-MS) (Supplementary 
Table 1), we were able to build atomic models for 19 of these assem- 
bly factors in the density map of state 1 (Fig. 1 and Extended Data 
Fig. 4). Intriguingly, 14 of them are located in the arc region of the 
central protuberance-polypeptide tunnel exit on the intersubunit 
surface (Fig. 1a), and five are immediately adjacent or bound to ITS2 
(Fig. la). In addition, we also located Sdal, Real and the Rix! sub- 
complex" in the map of state 2. 

Nogz2 binds at the centre of the pre-60S particle, via interaction of its 
GTPase domain and carboxy (C)-terminal domain with a multi-helical 
junction (Fig. 2), making extensive contacts with H93, H62, H64, H67, 
H69 and H71 of 25S ribosomal RNA (rRNA), and Bud20 (Extended 
Data Fig. 5g). This interaction stabilizes H69 and H71 in a nearly 
180°-flipped position (Fig. 2c-e) compared with their mature forms!”. 
In addition, the C-terminal extended loop of Rpf2 (residues 275-300) 
is inserted into the interface of Nog2-GTPase domain-C-terminal 
domain and H69-H71 (Extended Data Fig. 5a), also contributing to 
the displacement of H69-H71. Unexpectedly, the amino (N)-terminal 
extension of Nog2 (residues 1-200) lacks tertiary structures (Fig. 2a). 
Instead, in its fully extended form, the N-terminal extension of Nog2 
wanders around the inner surface of the tRNA passageway on the 
pre-60S particle and interacts with multiple components, including 
H92, L23, Nogl, H43, Rsa4, Nsa2, Rpf2, Rrsl and H86 (Extended 
Data Fig. 5). The very N terminus of Nog2 ends at a helical junction 
composed of H68, H74, H75 and H93 (Extended Data Fig. 5h). These 
observations nicely explain the previous model that Nog2 represents a 
converging node for different assembly branches and that its recruit- 
ment requires the prior association of multiple assembly factors'*"”. 

Our structures also reveal potential functions for the GTPase 
Nog1, which docks to a similar position in the pre-60S particle as its 
homologue ObgE does in bacterial large ribosomal subunits”’, with 
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Figure 1 | Cryo-EM structure (state 1) of the pre-60S particle purified 
from epitope-tagged Nog2. a, The 3.08-A cryo-EM map of state 1 is 

displayed in surface representation, with density of each assembly factor 
separately coloured. The 25S rRNA and r-proteins are coloured grey and 


its N-terminal four-helical-bundle domain (NTD) pointing to the 
peptidyl transferase centre (Fig. 3a). Interestingly, the NTD of Nog] 
directly passes through H839, separating it into two strands (Fig. 3a-c). 
Besides rRNA, the GTPase domain and NTD of Nog] also interact 
with Nog2 and Nsa2, respectively (Extended Data Fig. 6a). The 
C-terminal extension (CTE) of Nog], similar to the N-terminal exten- 
sion of Nog2, wraps around the pre-60S particle by over one-quarter 
of its circumference. On its way from the PO stalk base to the poly- 
peptide tunnel exit, the CTE of Nog] makes extensive contacts with 
nearly all of the assembly factors and r-proteins in this arc region (Tif6, 
Rlp24, Arx1, L3, L31, L22, L19, L35) (Fig. 3d) and with a variety of 
rRNA helices. The spatial relationship of Nog1 with these assembly 
factors agrees well with the previous model for ordered recruitment 
and release of assembly factors during cytoplasmic maturation of pre- 
60S particles. In particular, the CTE of Nog] interlocks with Rlp24 
by wrapping around a long helix (residues 85-130) at the C-terminal 
end of Rlp24 (Fig. 3d and Extended Data Fig. 6b), suggesting that 
these two assembly factors might be recruited and released as a sub- 
complex’. Indeed, the release of Nog] and replacement of RIp24 with 
124 in the cytoplasm is catalysed by the ATPase Drg] (refs 21, 22). 
Surprisingly, at the polypeptide tunnel exit, the CTE of Nog] turns into 
the polypeptide tunnel, extending all the way through the tunnel to 
the peptidy] transferase centre (Fig. 3e). It is tempting to hypothesize 
that this C terminus of Nog] (~75 residues) might enable polypeptide 
exit tunnel assembly, and/or test-drive the tunnel by surveying the 
conformational status of tunnel wall components (such as L39, L17 
and L4). Altogether, our data suggest several distinct roles for Nog1 
in the maturation of pre-60S particles. While the NTD might serve to 
remodel the peptidyl transferase centre, the CTE of Nog] apparently 
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beige, respectively. Both the intersubunit (left) and side (right) views are 
shown. b, Atomic models of 19 well-resolved assembly factors (coloured as 
in a) superimposed with their segmented cryo-EM densities (transparent 


grey). 


acts as a scaffold for assembly of many assembly factors and r-proteins, 
and might participate in quality control of polypeptide tunnel con- 
struction. Notably, a recent study showed that the tunnel is again 
probed in the cytoplasm by Reil (ref. 16) in a similar fashion as the 
CTE of Nog! does (Extended Data Fig. 6c, d), indicating the existence 
of continuous proofreading of the ribosomal tunnel from the nucleolus 
to the cytoplasm. 

In the map of state 1, a portion of the ITS2 pre-rRNA spacer is 
well resolved. This includes 59 nucleotides extending from the 
3’-end of 5.8S rRNA, and 6 nucleotides of ITS2 at the 5’-end of 25S 
rRNA (Fig. 4a-c), consistent with the presence of 25.5S and 7S pre- 
rRNAs in Nog2-particles°. In addition to known ITS2-binding factors 
Nop15, Rlp7 and Cicl (ref. 23), we also identified Nop7 and Nop53 
(ref. 24) in the region of ITS2 (Fig. 4d). This close co-localization of 
Nop 15, Rlp7, Cicl, and Nop7 around ITS2 explains their mutually 
interdependent association with pre-60S particles!. Notably, Nop53 
is required to recruit Mtr4 which participates in exosome-mediated 
ITS2 removal®°. Three r-proteins, L8, L25 and L27, also interact with 
these ITS2 factors (Extended Data Fig. 7). L8 directly contacts Nop15, 
Cicl and Nop7 (Extended Data Fig. 7a-d), complementing previous 
data that L8 is required for the assembly of these A3 factors”®. The high 
protein content in the ITS2 region further suggests that these factors 
function to chaperone and protect ITS2 for proper processing. Given 
the space required for progressive trimming of 7S pre-rRNA from its 
3/-end by the exosome and other nucleases”, it is conceivable that 
de-coating of assembly factors from ITS2 is coordinated with stepwise 
removal of ITS2. The de-coating process has to be accurately con- 
trolled, as depletion of A3 factors leads to rapid turnover of pre-rRNAs 
(reviewed in ref. 1). 
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State 1 


Figure 2 | Structure of Nog2 and its remodelling role in central helices 
H69-H71. a, Atomic structure of Nog2 (2-472 amino acids) with domains 
separately coloured, highlighting the N-terminal extension of Nog2. The 
orientation of Nog? in the pre-60S particle is shown in the left thumbnail. 
b, Local resolution map of Nog2. Segmented Nog2 density map is coloured 


In the structure of state 1, the 5S RNP (the subcomplex of 5S rRNA, 
L5 and L11) is positioned almost 180°-rotated from its position in 
the mature subunit, as previously reported®!!. Two central-protu- 
berance-binding factors, Rpf2 and Rrs1, which anchor the 5S RNP 
to the pre-60S particles in an earlier stage'*”°, are apparently crucial 
to maintain this distinct conformation of the 5S RNP, as they provide 


H89 H89 
Pre-60S position 


Figure 3 | Structure and binding partners of Nogl. a, The NID 

of Nog] inserts directly into the two strands of H89. GD, GTPase 
domain. b, Superimposition of the NTD of Nog1 with H89 in its mature 
conformation, displaying a steric clash in the terminal tip of the NTD 
of Nog]. ¢, Structural comparison of H89 in the pre-60S and mature 
conformations. d, The CTE of Nog] interacts with multiple assembly 
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according to the scale bar below. c, Zoom-in view of the H69-H71 region 
in the map of state 1, superimposed with atomic models of Nog2, H69 
and H71. d, Same as ¢, but for the density map of the mature 60S subunit. 
e, Comparison of H69-H71 in the two density maps. 


a support to the floating helical stem of the 5S RNP in the middle 
(Fig. 1a). In addition, rRNA helices of the central protuberance, stabi- 
lized by several interacting factors (Rpf2, Rrs1, Nsa2, Rsa4 and Nog2), 
display radically different conformations compared with those in the 
mature 60S subunit (Extended Data Fig. 8). Many of them are in com- 
pletely topside-down or inside-out positions (Extended Data Fig. 8a). 


c 


H89 


Cate 
AX? >. 
Si D 


H89 


factors (Tif6, Rlp24, Arx1) and r-proteins (L3, L31, L22, L19, L35) in an arc 
region of the pre-60S particle. The overview is shown in the left thumbnail. 
The position and direction of polypeptide tunnel exit is denoted by a black 
diamond. e, The last C-terminal portion of Nog] goes into the polypeptide 
tunnel and interacts with L4, L17, and L39 (see also Extended Data Fig. 6). 
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Figure 4 | Structure of ITS2 and associated factors. a, Secondary 
structure of the partial ITS2 rRNA sequences resolved in the map of state 1. 
b, Atomic model of the partial ITS2 rRNA. c, Same as b, with density map 


The most dramatic change is that H80 is stretched into a single 
strand. 

Comparison of the three states (1-3) indicates that the 5S RNP is 
in a different position in each state, reflecting snapshots of continu- 
ous rotational movement of the central protuberance (Extended Data 
Fig. 9a, b). In the structure of state 2, Rpf2 and Rrs1 are absent, and 
the 5S RNP has already rotated to a near-mature position, suggesting 
that removal of Rpf2 and Rrs1 is necessary for the rotation to occur. 
The structure of state 2 (~6.6 A resolution) is very similar to that of 
the recently characterized Rix1-Real particles'*, containing five addi- 
tional factors (Extended Data Fig. 9e-h): Sdal, Rix subcomplex (Ipil, 
Rix] and Ipi3), and Real. Sda1, with its characteristic HEAT (hunting- 
tin, elongation factor 3, protein phosphatase 2A and lipid kinase TOR) 
repeat domain sandwiched between the L1 stalk and H38, pulls the 
L1 stalk into an inward position (Extended Data Fig. 9e, f). The Rix] 
subcomplex sits above Sdal, contacting the gigantic remodelling 
ATPase Real situated above the central protuberance’! (Extended 
Data Fig. 9g, h). Therefore, removal of Rpf2-Rrs1 might lead to bind- 
ing of Sdal and the Rix1 subcomplex, as well as Real that subse- 
quently releases Rsa4 (ref. 30). This last remodelling event enables 
further accommodation of the 5S RNP in the mature-like position 
observed in the structure of state 3. Notably, the stepwise confor- 
mational maturation of the 5S RNP is coordinated with sequential 
conformational changes of H38 in the three structures. Interestingly, 
repositioning of H38 from state 1 to state 2 involves the confor- 
mational change of the C terminus of Cgr1, from a bent helix to a 
straightened form (Extended Data Fig. 9c, d). 

In summary, the rich atomic information presented in our struc- 
tures provides a valuable resource to interpret and integrate a large 
body of existing genetic and biochemical data of eukaryotic ribo- 
some assembly. In particular, it demonstrates potential diverse roles 
of assembly factors in late nuclear stages of large subunit assembly, and 
reveals unprecedented mechanistic details for two essential assembly 
GTPases, Nog] and Nog2. 
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Source Data, are available in the online version of the paper; references unique to 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Purification of Nog2-particles. Pre-ribosomes were purified by tandem affin- 
ity purification (TAP) with magnetic Dynabeads (Invitrogen) as explained 
previously*'. TAP-tagged Nsal, Nog2 and Nmd3 were used as baits to isolate 
ribosome assembly intermediates. The protein composition of the TAP-purified 
pre-ribosomes was determined by SDS-PAGE (4-10% Tris-glycine and 4-12% Bis- 
Tris, Invitrogen) followed by silver-staining*!. Protein levels in each intermediate 
were assayed by western blotting analysis. Furthermore, the proteins associated 
with each intermediate were identified by mass spectrometry. Purified samples 
were sent to Penn State Hershey Core Research Facilities for trypsin digestion and 
matrix-assisted laser desorption/ionization-time of flight analysis. Results were 
analysed by Protein Pilot software and proteins identified with >99.9% confidence 
were used for further analysis. 

XL-MS analysis. The Nog2-particles containing ~10 :g total proteins were incu- 
bated with BS? or DSS at 1:1 (w/w) protein-to-cross-linker ratio at 25°C for 1h 
before the cross-linking reaction was quenched with 20 mM ammonium bicar- 
bonate. Proteins were then precipitated with acetone, dissolved in 20 jl 8 M urea, 
100 mM Tris, pH 8.5, and digested with trypsin at 37 °C overnight. Liquid chro- 
matography—tandem mass spectrometry (LC-MS/MS) analyses of the digested 
samples were performed on an EASY-nLC 1000 system (Thermo Fisher Scientific) 
interfaced to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides 
were separated on a 75|tm x 10cm analytical column packed with 1.8,1m, 120A 
UHPLC-XB-C18 resin (Welch Materials) over a 110-min linear gradient made 
with buffer A (0.1% formic acid in HPLC-grade water) and buffer B (0.1% for- 
mic acid in HPLC-grade acetonitrile) as follows: 0-3 min, 0-5% B; 3-93 min, 
5-30% B; 93-100 min, 30-80% B; 100-110 min, 80% B. The flow rate was set to 
200 nl min~!. The mass spectrometer was operated in data-dependent mode with 
one MS1 event at resolution 70,000 followed by ten HCD MS2 events at resolution 
17,500. Dynamic exclusion time was set to 60s. Precursors witha charge state of 
+1, +2 or unassigned were rejected. Three analytical replicates were performed 
for both BS*- and DSS-cross-linked samples. To identify proteins in the sample, 
we performed an additional LC-MS/MS analysis without rejecting precursors 
of +2 charge state, and the MS data were then searched against an S. cerevisiae 
protein database using ProLuCID™. After filtering the ProLuCID search results 
using DTASelect2°°, 264 proteins were identified (false discovery rate for protein 
identity = 0.46%) and a database containing the sequences of these proteins was 
constructed for pLink search. Cross-linked peptides were identified using this 
database and the pLink software™, and the results were filtered by requiring false 
discovery rate < 0.05, E value < 0.0001, and spectral count > 2, which resulted in 
identification of 282 cross-linked peptide pairs (Supplementary Table 1). Results 
of XL-MS analysis, including information for peptide pair, statistical significance 
(E value), calculated mass, resolution (Amass) and mass accuracy (parts per 
million), are summarized in Supplementary Table 2. The XL-MS data have 
been deposited in the ProteomeXchange Consortium under data set identifier 
PXD003736, which contains one SEARCH file (pLink search result, false discovery 
rate < 0.05), seven PEAK files (ms2 files) and seven RAW files. 

Cryo-EM data acquisition. Vitrified specimens were prepared by adding 4-1l 
samples of Nog2-particles at a concentration of ~150nM to a glow-discharged holey 
carbon grid (Quantifoil R2/2) covered with a freshly made thin carbon film. Grids 
were blotted for 1 s and plunge-frozen into liquid ethane using an FEI Vitrobot 
Mark IV (4°C and 100% humidity). Cryo-grids were transferred to an FEI Titan 
Krios electron microscope that was operating at 300 kV, and images were recorded 
using a K2 Summit direct electron detector (Gatan) in counting mode at a nomi- 
nal magnification of x 22,500, corresponding to a pixel size of 1.32 A at the object 
scale and with the defocus varying from —1.0 to —2.0,1m. All micrographs with 
K2 camera were collected using UCSF Image4 (developed by X. Li and Y. Cheng) 
under low-dose conditions. Each micrograph was dose-fractionated to 32 frames 
with a dose rate of ~8.2 counts per physical pixel per second for a total expo- 
sure time of 8 s. A fraction of micrographs were also recorded using Titan Krios 
(FEI) microscope operated at 300 kV under low-dose conditions with an FEI eagle 
4k x 4k CCD camera, using an automated data collection software AutoEMation*». 
Image processing. Original image stacks were summed and corrected for drift 
and beam-induced motion at micrograph level using MOTIONCORR (devel- 
oped by X. Liand Y. Cheng)**. Programs of SPIDER*” and EMAN2 (ref. 38) were 
used for micrograph screening, automatic particle picking and normalization. 
The contrast transfer function parameters of each micrograph were estimated by 
CTFFIND3 (ref. 39). All 2D and 3D classification and refinement were performed 
with RELION”’. Two-dimensional reference-free classification was applied to fur- 
ther screen particles (Extended Data Fig. 2a). At first, four batches of data were 


collected (Extended Data Table 1) and processed separately following the same 
procedures. For each batch, particles were split into ten classes during the first 
round of 3D classification, with a map of the mature 60S ribosomal subunit (low- 
pass filtered to 60 A) as the initial model. Bases on the map features (the presence 
of ITS2 and the rotation of the 5S RNP), classes were combined and subjected the 
second and third rounds of 3D classification. Around 30% particles in the first four 
batches belong to state 1 (solid densities for ITS2 and the 5S RNP in a premature 
unrotated position). However, for the first four batches of data, particles displayed 
a strong orientation preference, which led to a noticeable distortion in the final 
density maps. Although the nominal resolutions of these maps were in the range 
of 3.8-4.5 A, the distortion prevented accurate atomic modelling. To limit those 
strongly over-represented angular projections, SPIDER and RELION were used to 
balance the particles within different projection groups (by limiting the maximal 
number of particles for each projection group) during 3D refinement. Nevertheless, 
this additional procedure improved the map appearance to a certain extent, but 
could not completely eliminate the distortion in the final density maps. Another 
attempt was performed by combining the first four batches of data before 2D and 
3D classification. All particles from the first four batches that belonged to state 1 
were grouped together and subjected to 3D refinement with orientation-limiting 
procedure applied. However, the orientation preference still limited the high- 
resolution refinement and atomic modelling. Therefore, a series of optimizations in 
cryo-grid preparation were applied before the collection of the fifth data set, includ- 
ing elevated sample concentration, prolonged glow-discharge time, and reduced 
blotting time. As a result, there was no detectable orientation preference in the fifth 
data set (batch 8 in Extended Data Table 1a). For this batch of data, ~184,222 raw 
particles were picked from 833 micrographs for several rounds of reference-free 2D 
classification, yielding 143,707 good particles for 3D classification. A map of state 1 
(low-pass filtered to 60 A) was used as the initial reference for the 3D classification, 
which split the particles into eight classes (Extended Data Fig. 2b). One (A5) of the 
eight classes (8% of total particles) were discarded. Four of them belonged to state 
1. The rest of these classes represented a series of intermediate structures. Another 
two batches of cryo-EM data were obtained (batches 9 and 10 in Extended Data 
Table 1a), which resulted in 304,296 particles for 3D classification into eight classes 
(B1-B8) (Extended Data Fig. 2b). Six of them (B3-B8) belonged to state 1, and 
as a result, they were combined for further high-resolution structural refinement. 
Comparison of state 1 structures from batch 8 and batch 9-10 indicates that the 
quality of last two batches of particles was slightly better, according to the density 
appearance of Cgr] in the density map. Therefore, a homogeneous data subset 
(191,848 particles) for state 1 was obtained (B3-B8), from which a density map 
with 3.8-A resolution (gold-standard Fourier shell correlation (FSC) 0.143 criteria) 
was constructed. To reduce the possible radiation damage to the particles, only 
frames 3-16 of each image stack were selected to generate a set of dose-reduced 
micrographs. A new set of particles were re-windowed from dose-reduced micro- 
graphs and subjected to 3D refinement, which improved the resolution to 3.6 A. 
A soft-edged mask was then applied during final rounds of the high-resolution 
refinement, further improving the resolution to 3.46 A. The final density map was 
corrected for the modulation transfer function of K2 detector, sharpened by apply- 
ing a negative B-factor automatically estimated by post-processing program of 
RELION, and corrected for the soft-masked induced effects on FSC curves using 
high-resolution noise substitution“, resulting in a 3.08-A density map for state 1. 
The local resolution map was estimated using ResMap”. 

To further improve the density map of state 2, all non-state 1 particles from 

batches 8, 9 and 10 were combined (168,267 particles in total) and subjected to a 
round of 3D classification (Extended Data Fig. 2b) into eight classes. Around 6.5% 
of particles (10,900) belonged to state 2 (C8), and refinement of these particles 
rendered a final density map at a nominal resolution of 6.6 A. 
Model building and refinement. Crystal structure of the yeast 80S ribosome (PDB 
accession number 3U5D)"” was used as the initial template for rRNA modelling. 
The models of the rRNAs (25S, 5.8S) were docked into the density map manu- 
ally using UCSF Chimera‘. The 5S rRNA was separately fitted into its density by 
rigid-body docking. The crystal structure of the 25S rRNA was compared with 
that of the Arx1-TAP pre-60S structure (PDB accession number 3J64)° in the 
density map, and fragments of nucleotides 995-1054, 2244-2318, 2615-2771 and 
2789-2804 of the crystal structure were cut out and fitted into our density map. 
After the initial fitting, the entire chains of rRNAs were manually checked and 
adjusted with COOT“. 

For modelling of ITS2 RNA, secondary structures were predicted using 
RNAfold* and drawn using RnaViz**. Atomic modelling of ITS2 RNA was per- 
formed with COOT, started with a poly-adenine model, followed by sequence 
replacement. 

For r-protein modelling, structures of individual proteins from the crystal struc- 
ture of yeast 80S ribosome (PDB accession number 3U5E)"” were separately fitted 


© 2016 Macmillan Publishers Limited. All rights reserved 


into the density map using Chimera. Except for L10, L24, L29, L40, L41 and L42, 
which are absent in the density map of Nog2-particles, chains of the remaining 
r-proteins were manually adjusted using COOT. Structures of L5 and L11 were 
first docked into the density map in a subcomplex with the 5S rRNA, followed by 
a similar manual adjustment in COOT. 

For modelling of biogenesis factors, the sequences of all factors according to 
the result of mass spectrometry (Extended Data Fig. 1b) were subjected to 2D and 
3D structure prediction, using PSIPRED” and I-TASSER™, respectively. Initial 
fitting of biogenesis factors, such as Nog2, Nop15, Rlp7, Nop7 and Cic1, was 
guided by previous biochemical data and our XL-MS data, and was confirmed 
by high agreement of secondary structural features between the predicted 3D 
models and the density map. Specifically, for each factor, the five 3D models 
predicted by I-TASSER were aligned in PYMOL”, and the common structural 
motifs were selected and used for rigid body fitting in Chimera. Taking Nog? as 
an example, the GTPase domain (residues 207-369) and the C-terminal domain 
(residues 373-486) were first separately fitted into Chimera, followed by manual 
adjustment of main chains and side chains in COOT. Linker building and fur- 
ther extension of chains in both the N- and C-directions were done manually in 
COOT. Information from secondary structural predication was used to aid main- 
chain tracing. In many cases, poly-alanine models were first built, and sequence 
assignments were aided by well-resolved bulky residues such as Phe, Tyr, Trp 
and Arg. As for factors Nog1, Rlp24, Rsa4, Arxl and Mrt4, the initial positions of 
them were taken from a previous low-resolution cryo-EM studies), followed 
by extensive model rebuilding in COOT. In particular, main-chain tracing of the 
C-terminal extension of Nog1 was done completely manually. For factors Rpf2 
and Rrsl, their S. cerevisiae models were generated using CHAINSAW” in the 
CCP4 suite*! with the crystal structures of Aspergillus nidulans Rpf2 and Rrs1 
(PDB accession number 4XD9 and 5BY8)!*? as templates. The crystal structure 
of the yeast Tif6 (PDB accession number 1G62)°* was docked into the density 
map to provide an initial position. The modelling of the factors in the ITS2 region 
was largely facilitated by our XL-MS data, as some portions of these factors were 
not resolved in the map. The modelling of Nsa2 was facilitated by the crystal 
structure of Rsa4 in complex with Nsa2 peptide (PDB accession number 4WJV)™4, 
which provided an anchor point for the N- and C-terminal halves of Nsa2 during 
atomic modelling. 

Docking of Sdal and Real in the density map of state 2 was facilitated by the 
recent cryo-EM study of Rix1-Real particles!*. The models of Sdal and Real (PDB 
accession number 5FL8)"* were fitted into our density map of state 2 as rigid bodies 
(Extended Data Fig. 9e-h). 

The atomic model of state 1 containing ribosomal proteins, rRNAs and assem- 
bly factors was refined against the density map first by real-space refinement 
(phenix.real_space_refine)°° in PHENIX® with secondary structure and geom- 
etry constraints applied. After refinement, alternating rounds of manual model 
adjustment using COOT and model refinement using PHENIX were applied. 
A final round of model refinement was done in Fourier space using REFMAC” 
with secondary structure, base pair and planarity restraints applied, according to 
previously established protocols°*. To avoid overfitting, different weights of the 
density map for refinement were tested. Cross-validation against overfitting was 
performed following the procedures previously described®*”*. The atom positions 
of the atomic model were randomly displaced by 0.5 A before the model was 
refined against a map reconstructed from half of the data (named Half1 map) 
produced by RELION during the last iteration of high-resolution structural refine- 
ment. And two FSC curves were calculated on the basis of refined model: one 
was FSCwork (model versus Halfl map) and the other was FSCies: (model versus 
Half2 map). In addition, another FSC curve was calculated for the comparison of 
refined model with final density map. Comparison of FSCjest and FSCyork Curves 
showed no large separation between them, indicating the final atomic model 
was not overfitted. Statistics of final model was evaluated using MolProbity® 
(Extended Data Table 1b). 

Of the 282 cross-linked peptide pairs identified in the XL-MS data, the distances 
of 151 lysine pairs could be calculated from the model of state 1. Ninety-four per 
cent of them (142) agree with the model with the Ca—Ca distances < 24 A between 
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two cross-linked lysine residues. Among the incompatible nine pairs, five of them 
are with the Ca—Ca distances < 30 A. 
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Extended Data Figure 1 | Compositional analysis of Nsal, Nog2 and 
Nmd3 particles. a, Mostly non-overlapping assembly factors Nsal, 

Nog2 and Nmd3 were used to purify sequential ribosome assembly 
intermediates. Proteins identified by mass spectrometry analysis were 
marked on the gel. Orange coloured proteins are only present in Nsal- 
TAP particles, green coloured proteins are present both in Nsal-TAP 

and in Nog2-TAP particles, light blue coloured proteins are present in all 
three purified particles to varying levels, dark blue coloured proteins are 
present only in Nog2-particles, pink coloured proteins are present both in 
Nog2- and Nmd3-particles in varying levels and yellow coloured proteins 


YBLO28C 


Ipil 
Ipi3 
Rixl 
Real 
Rsa4 
Nop53 
Sdal 
Nop13 


are present only in Nmd3-particles. TAP-tagged proteins are indicated 

by white asterisks. For gel source data, see Supplementary Fig. 1. b, The 
lifetimes of mostly non-overlapping ribosome assembly intermediates 
containing assembly factors Nsal, Nog2 and Nmd3 are indicated. 
Assembly factors identified in each of Nsal-TAP, Nog2-TAP and Nmd3- 
TAP associated samples were colour coded. The colour scheme is identical 
to that used in a. *Even though this protein was identified in all three 
intermediates, its levels decreased more than sevenfold from Nsal-TAP 
particles to Nog2-TAP particles. 
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a 

b 3D classification of Batch 8 3D classification of Batch 9 and 10 
(143,707 particles and 8 classes) (304,296 particles and 8 classes) 
AS B3+...+B8 


6 
0.079 
State 1 Bad particles State 1 
(with poor Cgr1 density) (with strong Cgr1 density) 
3D classification Refinement 
(168,267 particles and 8 classes) (191,848 particles) 
c2 C5+...+C7 
0.129 0.444 
State 5 State 4 State 3 State 1 
Bad particles Intermediates after 5S rRNA rotation Intermediate before 


5S rRNA rotation 


Extended Data Figure 2 | Cryo-EM data processing of Nog2-particles. a, Representative 2D class averages of Nog2-particles. b, A flow-chart for 
3D classification of Nog2-particles (data batch 8-10, see Methods for details). 
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Extended Data Figure 3 | Resolution estimation and model validation. 


a, Representative micrograph of Nog2-particles. b, Local resolution 
map of the final density map of state 1. c, FSC curve for the final density 
map (state 1). The nominal resolution is 3.08 A estimated using the 


gold-standard (FSC = 0.143) criterion. d, Atomic model cross-validation. 


Resolution (A) 


Three FSC curves were calculated between the refined model (against 
Halfl map) and the final map (black), between the refined model with 
Halfl map (FSCyo1x, red), and between the refined model with Half2 map 
(FSCtest, blue) (see Methods for details). 
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Extended Data Figure 4 | Local densities of representative regions for different assembly factors. a-1, Cryo-EM densities of representative regions of 
assembly factors, superimposed with respective atomic models. 
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Extended Data Figure 5 | Interaction network of Nog2 in the pre-60S loop, respectively. h, Interactions between rRNA components (H43, H68, 
particle. a—g, Pairwise illustration of binding partners of Nog? in the H74, H75, H86, H92, H93) and Nog?. For clarification, H69 and H71 
pre-60S particle. Residues of Nog2 involved in atomic contacts are are not shown. The N terminus of Nog? is located in a helical junction 


coloured red with residue numbers labelled. H and L denote helix and composed of H68, H74, H75 and H93. 
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Extended Data Figure 6 | The NTD of Nog] interacts with Nsa2 and 
Nog2. a, Nsa2, Nog2 and Nog] collectively stabilize H89 in a distinct 
conformation. Nog] interacts with Nog2 and Nsa2 through its GTPase 
domain and NTD, respectively. b, The CTE of Nog] interlocks with Rlp24 
by wrapping around a long helix at the C-terminal end of Rlp24 

(see also Fig. 3). c, d, Comparison of the CTE of Nog] and the CTE of Reil 


b N415 


Nog1-NTD 


in the polypeptide tunnel. Atomic models of state 1 (c) and 60S-Arx1- 
Alb1-Reil (d) (PDB accession number 5APN)'® are aligned using the 
60S subunit. For clarification, only Arx1, Nog] and Reil are shown. 

e, Superimposition of c and d. Four major places of steric clash between 
Reil and Nog] are marked by asterisks. 
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Extended Data Figure 7 | Mutual interactions between factors and Nop15 (g) and Nop53 (h). Residues involved in atomic interaction sites are 
r-proteins in the ITS2 subcomplex. a, An overall view of the ITS2 labelled with sequence numbers. H, L, S denote helix, loop and strand of 
subcomplex. b-d, L8 interacts with three factors: Nop15 (b), Cicl (c) and respective structures. 


Nop7 (d). e, L27 interacts with Nop53. f-h, L25 interacts with Rlp7 (f), 
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Extended Data Figure 8 | Restructuring of rRNA helices in the for the mature 60S subunit. The mature 60S subunit was aligned to 
central protuberance region by Nsa2, Rpf2, Rsa4, Rrs1 and Nog2. state 1 structure globally. c—g, Pairwise interactions between the central 
a, Conformation of rRNA helices from the central protuberance (H80, protuberance helices and factors are shown in separate panels. 


H82-H88, 5S rRNA) in the pre-60S particle (state 1). b, Same as a, but 
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Extended Data Figure 9 | Structures of different assembly states of the 
pre-60S ribosomal particles. a, Cryo-EM density maps of three premature 
states (1-3) and the mature state are displayed in transparent surface 
representation, superimposed with models of the 5S RNA, H38 

and associated central-protuberance-binding factors. b, Zoom-in views 

of the central protuberance regions in a. For clarification, only atomic 
models are shown. Comparison of these four states indicates that the 5S 
RNP rotates to a near-mature state (state 2) after Rpf2-Rrs1 leave, and 
further release of Rsa4 in state 3 results in a ‘mature-like’ conformation 

for the 5S RNP. H38 from these four states is in a series of continuous 
changes coupled with the 5S RNP conformational maturation. c, d, Spatial 
relationship of the 5S RNP, H38, Rsa4 and Cgrl in state 1 (c) and state 2 (d). 


Rix subco 


ey 4 
fie 


Note that repositioning of H38 from state 1 to state 2 is coupled with a 
dramatic conformational change on the C-terminal end of Cgr1. 

e-h, Additional assembly factors identified in the density map of state 2. 
One piece of additional density between H38 and L1 contains a 
characteristic HEAT repeat, which contacts the L1 stalk in an inward 
position (e). The atomic model of Sdal (PDB accession number 5FL8)"* fits 
well with the segmented density (f). For clarification, densities immediately 
above Sdal are not shown in e and f. A large piece of additional density in 
the map of state 2, composed of the Rix] subcomplex and Real (g, h). 

The density assignment was facilitated by the cryo-EM structure of Rixl- 
Real particles!*. Superimposition of the atomic model of Real (PDB 
accession number 5FL8)'* with the segmented density map of Real (h). 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Table 1 | Statistics of data collection, structural refinement and model validation 


Batches Electron Camera Micrographs Particles for 2D Particles for 3D 
Microscope (Original micrographs) classification classification 
1 F20 US4000 381(966) 76,323 19,253 
2 Titan Krios Eagle 3,184(4,701) 154,785 133,455 
3 Titan Krios K2 1,017(1,136) 90,888 35,096 
4 Titan Krios K2 1,497(1,579) 200,292 100,956 
5 Titan Krios K2 997(1,002) 128,515 54,574 
6 Titan Krios K2 1,114(1,114) 139,309 54,257 
7 Titan Krios K2 1,014(1,016) 134,937 50,334 
8 Titan Krios K2 833(852) 184,222 143,707 
9 Titan Krios K2 901(901) 225,167 146,349 
10 Titan Krios K2 1,019(1019) 248,518 157,947 


Data Collection 


EM equipment FEI Titan krios 

Voltage (kV) 300 

Detector Gatan K2 

Particles 191,848 

Pixel size (A) 1.32 

Defocus range (um) 1.0-2.0 

Electron dose (e/A? ) 50 (32 frames)/22 (frame 3-16) 


Model composition 


Peptide chains 54 

Protein residues 13,982 

RNA chains 3 

RNA bases 3,446 
Refinement 

Resolution (A) 3.08 

Map sharpening B-factor (A?) -65 

R factor 0.3040 

Fourier Shell Correlation 0.7814 


Rms deviations 


Bonds (A) 0.0054 
Angels(°) 0.9687 

Validation (proteins) 
Molprobity score 2.43 (96" percentile) 
Clashscore, all atoms 3.44 (100" percentile) 
Good rotamers (%) 80.87 


Ramachandran plot 
Favored (%) 88.14 
Outliers (%) 3.46 
Validation (RNA) 
Correct sugar puckers (%) 97.16 


Good backbone conformations (%) 71.60 


© 2016 Macmillan Publishers Limited. All rights reserved 


CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature17420 


Corrigendum: Observation 
of polar vortices in oxide 


superlattices 


A. K. Yadav, C. T. Nelson, S. L. Hsu, Z. Hong, J. D. Clarkson, 

C. M. Schleptitz, A. R. Damodaran, P. Shafer, E. Arenholz, 

L. R. Dedon, D. Chen, A. Vishwanath, A. M. Minor, L. Q. Chen, 
J. FE Scott, L. W. Martin & R. Ramesh 


Nature 530, 198-201 (2016); doi:10.1038/nature16463 


In this Letter, the surname of author Christian M. Schlepiitz was 
incorrectly spelled “Schlepiietz”. This has been corrected in the online 
versions of the paper. 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature16997 


Corrigendum: Signalling 
thresholds and negative B-cell 
selection in acute lymphoblastic 


leukaemia 


Zhengshan Chen, Seyedmehdi Shojaee, Maike Buchner, 
Huimin Geng, Jae Woong Lee, Lars Klemm, Bjorn Titz, 

Thomas G. Graeber, Eugene Park, Ying Xim Tan, 

Anne Satterthwaite, Elisabeth Paietta, Stephen P. Hunger, 
Cheryl L. Willman, Ari Melnick, Mignon L. Loh, Jae U. Jung, 
John E. Coligan, Silvia Bolland, Tak W. Mak, Andre Limnander, 
Hassan Jumaa, Michael Reth, Arthur Weiss, Clifford A. Lowell 
& Markus Miischen 


Nature 521, 357-361 (2015); doi:10.1038/nature14231 


In Extended Data Fig. 3b of this Letter, 52 flow cytometry dot plots 
with double stainings for CD19 and ITIM-bearing receptors (PECAM1, 
LAIR1, CD300A and BTLA) were shown for 13 samples. The CD19- 
CD300A staining for sample ICN1 was inadvertently replaced with 
CD19-CD300A staining for sample PDX2. The Supplementary 
Information for this Corrigendum contains the corrected Extended 
Data Fig. 3b (showing the correct dot plot for sample ICN1). Our 
conclusions are not affected. 


Supplementary Information is available in the online version of the Corrigendum. 


138 | NATURE | VOL 534 | 2 JUNE 2016 
© 2016 Macmillan Publishers Limited. All rights reserved 


CORRECTIONS & AMENDMENTS 


ERRATUM 
doi:10.1038/nature17622 


Erratum: Epithelial tricellular 
junctions act as interphase cell 
shape sensors to orient mitosis 


Floris Bosveld, Olga Markova, Boris Guirao, Charlotte Martin, 
Zhimin Wang, Anaélle Pierre, Maria Balakireva, 

Isabelle Gaugue, Anna Ainslie, Nicolas Christophorou, 

David K. Lubensky, Nicolas Minc & Yohanns Bellaiche 


Nature 530, 495-498 (2016); doi:10.1038/nature16970 


In Fig. le of this Letter, the y-axis label incorrectly read: ‘“GFP-Mud 
intensity (TCJs per junction)’ instead of ‘“GFP-Mud intensity (at TCJs 
relative to septate junctions). This has been corrected in the online 
versions of the paper. 
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ILLUSTRATION BY THE PROJECT TWINS 


DIGITAL FORENSICS 
IN THE LIBRARY 


Archivists are borrowing and adapting techniques used in criminal 
investigations to access data and files created in now- obsolete systems. 


BY MARK WOLVERTON 


hen archivists at California’s 
Stanford University received the 
collected papers of the late palae- 


ontologist Stephen Jay Gould in 2004, they 
knew right away they had a problem. Many 
of the ‘papers’ were actually on computer disks 
of various kinds, in the form of 52 megabytes 
of data spread across more than 1,100 files — 
all from long-outdated systems. 

“It was a large collection, as you can 


imagine,’ says Michael Olson, service man- 
ager for the Born Digital/Forensics Lab at 
Stanford University Libraries. “He used a lot 
of early word processing for his writing, lots of 
disks and diskettes in different formats.” 
After considerable effort the Stanford 
archivists did get Gould’s papers into order — 
first by finding hardware that could read the 
obsolete disks, and then by deciphering what 
they found there. “We had some challenges 
finding old applications to figure out what 
word processor he used, that sort of thing,” 
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says Olson. 

The Gould papers were an early indication 
of an issue that’s been rapidly worsening: four 
decades after the personal-computer revolu- 
tion brought word processing and number 
crunching to the desktop, the first generation 
of early adopters is retiring or dying. So how 
do archivists recover and preserve what's left 
behind? 

“People around the world have informa- 
tion stored on disks that are less readable 
with every passing day,’ says Christopher 
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» Lee, a researcher in the School of Informa- 
tion and Library Science at the University of 
North Carolina (UNC) in Chapel Hill. “This 
includes floppies, Zip disks, CDs, DVDs, 
flash drives, hard drives and a variety of other 
media.” Many files can be accessed only with 
long-obsolete hardware, and all are subject 
to physical deterioration that will ultimately 
make them unreadable by any means. By now, 
many libraries, archives and museums have 
accumulated shelves full of such material, 
stashed away in the hope that if it’s ever 
needed, somebody, somewhere will be able to 
figure out how to access it. 


DIGITAL INSPIRATION 

Increasingly, archivists are finding inspira- 
tion in the field of digital forensics: the art 
of extracting evidence about illicit activity 
from computer drives, smartphones, tablets 
or even GPS devices. “It turned out that law- 
enforcement and computer-security people 
were dealing with essentially the same prob- 
lems of stabilizing and recovering data from 
digital media,” says Matthew Kirschenbaum 
at the University of Maryland in College Park. 
And many of their solutions were directly 
applicable to the archivists’ needs. 

In law enforcement, for example, a top 
priority is to preserve material in its original 
form. This is often harder than it sounds: 
almost anything done on a computer, even 
something as innocuous as plugging in a 
USB drive, leaves a faint digital trace. So 
digital-forensics practitioners have developed 
techniques for creating an artefact-free ‘disk 
image’ that duplicates everything, down to 
the unused and hidden disk space. They can 
then preserve the integrity of the original for 
evidentiary purposes in court while doing all 
their forensic analysis on a perfect copy. 

Institutions working to decipher collections 
have the same need, although in their case, 
the object is to maintain the provenance of the 
original for future researchers. Creating foren- 
sic copies of the data was a relatively fringe 
idea 8 or 10 years ago, Lee says. “It’s now quite 
common in library and archive settings.” 

Unfortunately for archivists, however, disk 
imaging is usually done through commercial 
software packages such as the Forensic Toolkit 
made by Access Data in Lindon, Utah, or by 
EnCase, which is developed by Guidance 
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Software in Pasadena, California. Because 
these packages are designed for criminal 
investigators, they include tools for file carv- 
ing (assembling complete files from frag- 
mentary data); cracking passwords; accessing 
encrypted files; advanced searching; and gen- 
erating reports for use in court — tasks that 
tend to be less important for archival pur- 
poses. These packages also come with licens- 
ing costs in the thousands of dollars, which 
would strain the budget of many collecting 
institutions. 

So in 2011, Lee and his colleagues launched 
BitCurator, a platform designed for the archi- 
val field, with funding from the Andrew 
W. Mellon Foundation, and with continued 
support from a consortium that currently 
encompasses 25 member institutions, includ- 
ing Harvard University, the Massachusetts 
Institute of Technology, Stanford Univer- 

sity, Emory Univer- 


“People around sity and the British 
the world have se _ 
information ee the advantage o 
stored on disks et ak eee 
and freely available 
that are less ae 
< for download (wiki. 

readable with bi ets 
: itcurator.net). “It’s a 
do passing combination of third 


party open-source 
tools and our own 
work,” says Kam Woods, a research scientist 
at UNC’s School of Information and Library 
Science and co-principal investigator with Lee 
on the project. On the basis of the turnout at 
training sessions and other BitCurator events, 
Lee estimates that several dozen institutions 
now use the package actively, and several hun- 
dred more use it at least occasionally. 

BitCurator not only handles disk imaging, 
but a number of other issues that criminal 
investigators don’t have to worry about. One 
example is redaction: editing out confiden- 
tial material before publication. That’s an 
alien concept in the criminal investigations, 
says Olson. “Why would you ever want to 
redact evidence from a case? But from an 
archival or library standpoint, you wouldn't 
want to make somebody’s health records 
available.” So BitCurator has to have meth- 
ods for access control that don't really exist 
in the forensics field. 

Another speciality of BitCurator is its 
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Librarians and researchers are 
racing to cope with a flood of 
open data go.nature.com/r5k6tw 
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ability to read long-outdated disks — an 
essential tool for archivists who are faced 
with stacks of old floppies or even reels of 
magnetic tape. Although digital-forensics 
investigators usually deal with newer-gen- 
eration systems, their techniques can still be 
quite useful for recovery, says Lee. “Taking a 
forensic approach, you can still create a safe 
copy of the data, even if you don't know what 
the file system is or you can't read it,” he says. 
“As long as you can attach a drive and get the 
bits off of it, you can create an image.” Archi- 
vists can then experiment on different ways 
to retrieve the files, safe in the knowledge that 
the original is not in danger. 

Some advantages to the forensics-based 
approach transcend technical considera- 
tions, says Olson. With the Gould archives, 
for example, “you can get timestamps from 
different word-processing files to see how he 
actually wrote something, a particular order 
that he wrote, a way that he edited. That’s 
really nifty if you're a researcher that wants 
to know how his mind worked.” 


SEARCH AND RESCUE 

The same techniques can be used for other 
purposes besides archiving. At Stanford, 
Olson’s lab is increasingly helping faculty 
members and students who need to access 
work that was born on now-outdated com- 
puter systems. “I had a graduate student about 
a year ago that came to us with an astrophysics 
data set on a Zip disk,” he says. “It was some- 
thing that their professor had created, that 
they werent able to read and needed to get 
to because it was part of their research. And 
nobody had really shepherded that to a new 
modern system.” The library was able to help 
the student do just that. 

Another recent example is Stanford’s long- 
running ME310 engineering course, which 
had a server full of design studies, presenta- 
tion slides and videos that students had com- 
pleted over the years as part of their graduate 
work. “The people running the programme 
wanted to preserve all the data from these 
projects,” says Olson, “but they needed help to 
recover the data, organize it and also get per- 
mission from the students to actually make 
this available” 

Data are already being lost to science at 
a rapid rate. One study, for example, found 
that as little as 20% of data for ecology papers 
published in the early 1990s is still avail- 
able (T. H. Vines et al. Curr. Biol. 6, 94-97; 
2014). Co-author Tim Vines, who now runs 
a peer-review service called Axios Review in 
Vancouver, Canada, says that the best way for 
scientists to preserve their data for future gen- 
erations is to upload it into library-maintained 
archives or open online repositories, such as 
Dryad or Figshare. 

“Putting it into the hands of an organization 
committed to preserving it is far better than 
putting it on a shelf”, he says. = 
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Postdoctoral researchers at the Ludwig Center at Johns Hopkins in Baltimore, Maryland, participate in an annual Halloween contest. 


TEAM BUILDING 


orale boosters 


You can keep spirits up when the research doldrums hit. 


BY KENDALL POWELL 


very early-career scientist has been there: 
E« months pass with no good news to 
report at a lab meeting. You can’t move 
on to the next phase of your project because you 
still need to complete a particular experiment 
or analysis. Dread overtakes you at the thought 
of facing yet another week of try and try again. 
Senior researchers know that this is the 
norm, not the exception. Making cutting-edge 
discoveries means that you may figuratively 
bash your head against a brick wall for many 
months before any true breakthroughs happen. 
But even the most resilient junior researchers 
can get depressed and frustrated when weeks 
of experiments leave them empty-handed. The 
best group leaders know how to keep morale 
flying high in the face of the research doldrums 


(see ‘Let them eat cake’). They use one-on- 
one meetings and progress reports to keep the 
wheels rolling, and lab outings, group-bonding 
activities and silly contests to keep the ‘fur’ in 
functional labs. Many also have strategies to 
‘normalize failure in their labs, so that research- 
ers won't hide or grow listless when projects 
aren't working. 

Principal investigators (PIs) and lab heads 
who want to maintain high spirits — and high 
productivity — in their labs need to keep in 
mind that morale flows from the top down, 
say veteran group leaders. It is crucial for PIs 
to foster community building in their lab to 
give team members a sense of belonging and to 
establish a support network that will see junior 
members through the roughest spots. 

“The success of people in the lab at certain 
times can elevate the success of everyone,’ says 
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Jeff Karp, who leads a bioengineering group at 
Brigham and Women’s Hospital in Cambridge, 
Massachusetts. He and other PIs make sure to 
acknowledge and celebrate successes publicly 
when they do arrive. “It’s important to have in 
place a high morale because it helps bring out 
the best in everybody,’ he says. 


BREAKOUT TIME 
Stephen Royle, group leader at the University of 
Warwick, UK, knows that he is not necessarily 
the first person to whom his team members will 
turn when things aren't going well on a project 
in his cell-biology lab. So he tries to create and 
nurture a high level of trust and camaraderie in 
his group to establish a safety net that won't let 
any one person's woes reach crisis level. 

His group goes on regular lunch and laser- 
tag outings together, and he’s also dreamt up 
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> some friendly competitions to keep things 
light in the laboratory. Using the free pens, 
T-shirts and other trinkets gathered at scientific 
conferences for prizes, he runs a Lab Quiz event 
akin to pub quizzes. Lab members compete on 
trivia questions about the university and its host 
city of Coventry, and on little-known facts about 
clathrin, the labs favourite molecule. 

Royle’s lab members chart lab records — who 
can get the maximum yield from a DNA prepa- 
ration, say — on a whiteboard, and one wall 
hosts the “Western blot Hall of Shame’ with hor- 
ribly smeared protein samples. Such a display is 
a great equalizer and morale-builder, Royle says 
— it shows students and junior lab members 
that even the most accomplished senior team- 
mate with a stellar publication record can run 
experiments that deliver rubbish results. He also 
likes to tell his team about past blunders of big 
names — such as how Nobel laureate Roderick 
MacKinnon broke the electrode of a pH meter 
during his first days of undergraduate research. 

It’s important to “de-pathologize” the idea of 
failure in science, says astronomer Keivan Stas- 
sun, senior associate dean for graduate educa- 
tion and research at Vanderbilt University’s 
College of Arts and Science in Nashville, Ten- 
nessee. Instead, you teach students that failures 
are more akin to having writers’ block, anormal 
part of the process that everyone runs into. It is 
imperative, he adds, that no one feel isolated in 
their struggles. 

With that in mind, many lab heads say that 
they try to flatten hierarchies and promote a 
connected, cooperative environment. One 
hierarchy-shattering activity that Karp uses is 
a three-minute presentation competition. Eve- 
ryone in the lab gives a quick talk on any topic 
they choose — these have ranged from the best 
hamburger joints in the region to historical fig- 
ures from India — and then gets feedback. The 
group votes on the best presentation and the 
best critique, and each winner receives a useful 
electronic gizmo, such as a slide pointer. 

To promote collegiality, some PIs take their 
team-building ventures outdoors. When cell 
biologist Anne Straube was asked to organize a 


Anne Straube runs Cake Club in her department. 
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LET THEM EAT CAKE 


Good ways to wind down at the end of the week 


At the end of a dispiriting week in the lab, 

it doesn’t hurt to have something to look 
forward to as a morale lift. For PhD student 
Alice Bachmann, that is Cake Club —a 
weekly meeting of researchers in her building 
that is centred around the simple act of 
eating scrumptious cake. “Always on Fridays. 
It’s a good way to finish the week with a high 
sugar level and high level of happiness,’ says 
Bachmann, who studies cell biology at the 
University of Warwick, UK. 

“| want to keep people sane,” says Anne 
Straube, Bachmann’s adviser. “After they've 
spent a week in the microscope room 
counting dots on the computer screen, they 
need to break out of this.” Although she 
doesn’t quite remember how Cake Club 
started, she thinks that excelling at recipes in 
the kitchen helps lab members to hone their 
experimental skills, too. 

Creations have included cakes in the 
shape of a brain, a green fluorescent 


social activity for her department's retreat, she 
warned them that it could get muddy — she’s 
a fan of orienteering, which can mean run- 
ning through ditches or alongside creeks. 
She designed a course on her campus at the 
University of Warwick: teams of 3 had to find 
the locations of 45 picture clues. Once people 
have collective fun and get to know each other, 
it makes it easier to ask for advice or a reagent, 
she says. “It breaks down the hierarchies and lets 
them be less serious about things” 

Running two lab groups — one in Uppsala, 
Sweden, and one in Milan, Italy — means that 
cancer researcher Elisabetta Dejana must work 
doubly hard to ensure cohesiveness. She holds 
lab meetings once a week over Skype, and in 
January, organized a retreat for everyone to 
meet and gather in Milan. She hosted both 
groups at her home for pizza, gelato and wine, 
and took them on hikes. 

The informal gathering tightened group 
bonding and eased the tension over competi- 
tion between the groups. “They are exchanging 
messages and mice and cells,’ she says. “Now, 
they understand that working together will help 
everyone.” When peer reviewers send back a 
research paper with calls for new experiments, 
she divides the work up among the group, which 
makes the revisions go more quickly and adds 
energy to the team, she says. 


CAREFUL SCREENING 

Creating a strong sense of community often 
starts with the recruitment process. Cancer 
researcher Bert Vogelstein accepts only appli- 
cants whom he thinks will be able to hold up 
emotionally and psychologically through the 
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cheesecake and a cake decorated like the 
page of a PhD thesis. Members are also 
encouraged to experiment and bring their 
failures (which are still edible, after all). 

‘Vino’, a social hour with refreshments, 
ends the week at the Vanderbilt Initiative in 
Data-Intensive Astrophysics at Vanderbilt 
University in Nashville, Tennessee. The 
gathering often includes a toast to research 
successes — getting a paper accepted, a 
fellowship awarded or grants funded. 

The Vino room contains no projector or 
whiteboard, so no one can inadvertently 
slip too far into shop talk, and the informal 
chatting makes advisers and other 
professors more approachable for trainees. 
“We’ve been collectively working hard all 
week. Hopefully, we’ve had some glimmers 
of success, but we mostly all experience 
a whole lot of slog and tribulation,’ says 
Vanderbilt astronomer Keivan Stassun. “And 
we all deserve a beer.” K.P. 


challenges of bench science. During interviews, 
he probes candidates’ mindsets: asking them 
how long they expect to work on their project, 
for example, and whether they have failed at 
anything before. “If they say, ‘No, that’s a conver- 
sation ender — they are not being honest with 
themselves,” Vogelstein says. He also phones 
their previous mentors to glean detailed infor- 
mation about the candidates’ lab experiences 
and how they handled setbacks. 

Vogelstein and Kenneth Kinzler — who 
co-direct the Ludwig Center at Johns Hopkins 
in Baltimore, Maryland — run their group of 
about 15 trainees as subgroups of 2 or 3 people. 
The subgroups function as a risk-mitigation 
plan: all the members are co-authors on any 
papers. So even if one member has a particu- 
larly difficult project that takes three years to 
publish, she or he will have other papers come 
out in the interim. 

And there are team rewards. Whenever 
work in the lab generates intellectual property, 
everyone in the lab benefits financially from any 
royalties. “Establishing a group that can cheer 
when someone else succeeds is not an accident,” 
Vogelstein says. “It requires structure and plan- 
ning of how people appreciate each other.” 

But even within a bonded community, 
trainees will get stuck at some point. Weekly 
check-ins or progress reports can prevent any 
hiding of problems. Lab heads can use software 
such as Trello or Slack, which allows everyone 
on a project to see progress — or lack thereof. 
Vogelstein has a pre-emptive approach. “Don't 
wait till they get stuck! The time to intervene is 
way before someone is despondent.” 

Switching a stuck lab member to a project 


STEPHEN ROYLE 


ITZHAR VARDI 


with better chances of success is acommon 
strategy. But Stassun says that sometimes the 
best option is just to push through. “The sin- 
gle most important thing you can do is pull 
the bedcovers back and walk out the door,” 
he says. He uses ShareLaTeX, a shared online 
text-editing application, as a quick way to 
check for stalled project manuscripts. Stuck 
trainees should find one thing that they can 
write down, he says — maybe it’s one para- 
graph describing an experimental set-up or 
one paragraph of the introduction explain- 
ing a piece of background research. 

Some senior scientists like to remind 
more-junior researchers that everyone gets 
mired at some point — it’s how they handle 
it that determines their success. When third- 
year PhD candidate Alice Bachmann, a 
member of Straube’ lab, got stuck for nearly 
18 months on how best to prove that she 
had depleted a protein from her rat cells, 
she recalled the mantra of a friend: “If Plan 
A is not working, there are 26 letters in the 
alphabet.” Plan D ended up working, after 
she repeated it many times. Astrophysicist 
Rodolfo Montez Jr, a support scientist for 
the Chandra mission at the Smithsonian 
Astrophysical Observatory in Cambridge, 
Massachusetts, notes that he spends most of 
his research time running up against ‘bugs’ 
when computing an equation for an astro- 
nomical observation. “Solving the bug is 
now what your job is. It's not a nuisance, it’s 
part of the path, and it’s cool? he says. 

That psychology of ‘flipping’ failure 
on its head is a 


recurring theme “Don’t wait till 
among lableaders: they get stuck! 
embrace the fail- The time to 

ures, embrace the interveneis way 
spinning wheels, before someone 


embrace the bad 
weather at the field 
station. These things force researchers to get 
more creative and to approach problems in 
fresh ways. That, the leaders add, is when 
true discovery often happens. 

Ultimately, the research enterprise works 
best when energy and enthusiasm remain 
high, even in the face of rejection, failure and 
defeat. “Keeping morale up in the lab is one 
of the most important aspects of trying to 
succeed,” says Vogelstein, whose group has 
identified more than a dozen major cancer 
genes, including the most common culprits 
in colon cancer. 

“Succeeding in science is difficult? he says. 
“We are always competing: for publications, 
for grants, for experiments. We are fighting 
battles all the time. The best thing to hear 
in the lab are the words, ‘It worked!” You 
might think after 35 years it gets old, but it 
doesn’t.” = 


is despondent.” 


Kendall Powell is a freelance writer based 
in Lafayette, Colorado. 


TURNING POINT 


Nanotech bridge 


Hossam Haick, a nanotechnology researcher at 
the Technion-Israel Institute of Technology in 
Haifa, has developed devices to detect cancer 
using exhaled breath rather than through 
biopsies. But, he explains, life in Israel can be 
difficult for Arab scientists. He is therefore trying 
to use his science to bridge cultural boundaries. 


How many Arab professors are at the Technion? 
Out of 600 faculty members, there are 9 Arab 
professors. The Arab community in Israel is 
20% of the population, but in academia, it is 
roughly 1%. There is a pervasive belief in Israel 
that the Arab community is not educated. I try 
to dispel that notion. 


How did you get the idea to use breath to 
diagnose disease? 

I read a lot of the history. From the ancient 
Greeks 2,400 years ago to Alexander Gra- 
ham Bell in the early 1900s, there were long- 
standing hypotheses of the smell of chronic 
disease in breath. I also heard hypotheses 
that dogs could smell cancer. I decided to see 
if 1 could prove scientifically whether there 
is something about exhaled breath that can 
reveal signs of disease. 


At what stage is the research? 

We have shown that exhaled breath contains 
unique fingerprints of specific diseases. We 
have lab results, as well as animal experi- 
ments. We have run clinical studies with 
5,000 patients across 19 departments and 
9 institutions, where we collected breath with 
a small device called NaNose, which is able to 
detect more than 1,000 different compounds in 
the breath from all of these people. We started 
with lung cancer, and have extended studies to 
gastric, colorectal and breast cancers, as well 
as to degenerative disease such as Parkinson's, 
Alzheimer’s and multiple sclerosis. In the case 
of lung cancer, we are able to discriminate 
between benign and malignant tumours with 
88% accuracy. By using breath to discriminate 
between benign and malignant tumours, we 
could save people from having to undergo 
unnecessary biopsies and surgeries. 


Will this be in use soon? 

We have built technology in a portable device 
that can detect disease in an easy and inex- 
pensive way — only a few thousand dollars. 
Three companies have obtained the licence 
from Technion. 


How else are you building on this technology? 
We are also working on Sniffphone. The idea 
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is to bring breath analysers into smartphones. 
Ifa risk is found, the smartphone could send 
the results to a physician. 


How do you try to improve relations between 
Arabs and Jews in Israel? 

Ina research institute, you are judged on excel- 
lence and achievements. That’s not the reality 
outside an academic institute, however. And in 
the Arab sector, there is a belief that whatever 
you do, you will not excel in Israel, unfortu- 
nately. As scientists, our role in the community 
is not only to produce papers. We have to dis- 
seminate the results and provide a message for 
the community. I volunteer to go to many com- 
munity schools, both Jewish and Arab, to talk 
about science. These efforts consume a lot of 
my personal time. Every year, I give 200 lectures 
in schools. This is huge, considering I lead three 
research consortia. But it’s important. 


How else do you pursue outreach? 

A professor at the Technion had the idea 
that we should disseminate the fundamen- 
tal principles behind nanotechnology and 
nanosensor research. We decided to present 
the course in a digital way, to talk to people 
beyond the boundaries of Israel. I said I would 
only do it if courses were offered in English 
and Hebrew, as well as in Arabic, so as not to 
discriminate. We have many people from Arab 
countries that have taken the course. One of 
the nicest data points is that 900 of those peo- 
ple who took my course were from Iran, which 
has huge political sensitivities with Israel. My 
students come from 127 countries worldwide, 
an indication that education, science and tech- 
nology can serve for peace. m 


INTERVIEW BY VIRGINIA GEWIN 


This interview has been edited for length and clarity. 
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Ua SCIENCE FICTION 


WHEN THE COLD COMES 


BY DEBORAH WALKER 


efore the funeral, I send seven sealed 
Bees to the Unwalled Cities ask- 

ing them to fulfil their obligations 
and send their best students to the Disease 
University and the Quarantine Security 
Academy. This is not the first 
request I’ve sent. I doubt the 
cities will honour their respon- 
sibilities, but the sealed letters 
will serve their purpose. Seven 
seals. If I were a Doctrinist, I 
might find some significance 
in that. 

Although in Isolation Theta 
we are, in the main part, 
atheists, we appreciate cere- 
mony as much as any Earth 
Doctrinist. The great and the 
good are gathered to attend 
the funeral of Dr Olinda Troy, 
the fourteenth Commander 
Pathologist of our colony. 

Her coffin is empty. She’s 
donated her body to the 
Disease U. I'm grateful. We're 
short of corpses, and my labs 
need bodies to test the latest 
antivirals. The Interferon- 
Zeconaril hybrid may, given 
time, prove effective. 

When it’s my turn to speak, 
I talk about an old Earth fable. The feck- 
less grasshopper spent her summer playing 
the mandolin, while the ant toiled to build 
a store of food. Winter comes, as it always 
does, and in the cold the hungry grasshopper 
begs the ant for food and is refused. The food 
is for the ant and for her family. Hard work, 
sacrifice and planning are the ant’s virtues. 

“Dr Troy was an ant,’ I tell the congre- 
gation. “When the cold virus, Rhinovirus 
HRV-A488 mutated into the Bleeding Eyes 
serotype, she was ready. She activated the 
city walls. She marshalled the quarantine 
police. She enforced the daily testing regime, 
and expelled the infected: man and woman 
and child. She ensured our survival. Only 
through the harshest measures can we endure 
the Cold. And afterwards, she continued the 
fight, developing weapons against the myriad 
serotypes that wait for us. Through her will, 

she saved this colony. 


D> NATURE.COM As the fifteenth Com- 
Follow Futures: mander Pathologist, 
Y @NatureFutures I will do the same, if 
Ei go.naturecom/mtoodm called to do so.” 
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Be prepared. 


My eulogy is received in silence. There 
is a new ethos in the city. The next speaker 
tells it. “Compassion is the hallmark of this 
colony. We pity the scientists who gene- 
modded the common-old viruses. We pity 
the doctors who undertook the clinical trial. 
We pity the lab workers who manufactured 


the viruses. We pity the physicists who plot- 
ted the course of the bioweapon. We pity 
the Doctrinist Senate that approved the 
decision. We pity every Doctrinist man and 
woman who condoned that act to send the 
bioweapon hurtling after the people fleeing 
Earth. We pity them all” 

It’s a grasshopper sentiment, as Dr Troy 
often remarked. She’s dead. She'll not have to 
witness another epidemic. There is my pity. 

The service ends, the grasshoppers move 
out quickly. We ants are slower, older. 

I pass two grasshoppers. The woman 
clings to her husband. “I can’t believe Earth 
sent the Cold after us. What hatred they 
had.” 

“T pity them,” says the man sanctimoni- 
ously. 

Pity is a fine thing. But do the grass- 
hoppers think it will save them? They’re 
lulled because the Cold has been quiet for 
half a century. Yet there is no vaccination 
against the Cold, and little to no cross- 
protection against the serotypes. Earth sent 
us at least 500 serotypes, highly contagious 
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and genemodded to have fatal potentiality. 
Repeated mutations, owing to low-fidelity 
replication and frequent recombination, 
mean that at any time antigenic shift may 
occur, and a new, fatal pandemic may arise. 
Witness the previous epidemics: Bleeding 
Eyes, The Judders, Heart Halt, Dry Dust 
Brain, The Weeping Trembles. 

We cannot vaccinate against 
500 constantly mutating 
strains, but I believe the anti- 
viral hybrids will eventually 
generate a cure. 

The grasshoppers, however, 
want to divert our resources 
to other things. Wouldn't it be 
fine to build another ship, they 
argue? To go off planet. 

Of the walled cities we built 
before the attack, only Theta 
stands. Our colony ship was 
stationed at Epsilon. In the 
chaos of our first pandemic, 
the citizens of Epsilon fled the 
planet. But they took the Cold 
aboard, virions harboured in 
the air they breathed. 

We never heard from the 
colony ship. Why woulda new 
ship be any different? 

I walk slowly to the labs. 
Sometimes, and this is a fan- 
ciful conceit, I imagine the 
Cold as an entity. A spectre that’s staying its 
hand. It knows that if it attacks in 20 years, 
it will find the cities overflowing with grass- 
hoppers, full of pity, unwilling to do what is 
necessary. The Cold tires of the contest and 
in a few decades it will have its endgame. 

A fancy only. A virus has no will. There 
was only the will of the Doctrinists who cre- 
ated it and set out to destroy us non-believ- 
ers. And there is only my will that will keep 
the game in play. 

I have sent seven sealed letters. When 
the seals are opened, a new pandemic 
will be born. The Cold will come, killing 
grasshoppers and ants alike. But I will be 
here. I will ensure that we endure. I will set 
a new generation of ants working towards 
the cure. 

Iam the sender of the seals. I am the har- 
binger of the plague. I am the doctor who 
sends Cold into our world. 

Do not pity me. m 


Find Deborah in the British Museum 
trawling the past for future inspiration. 
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